SVG Image
< Back to news

23 March 2025

Why GPT Can’t Think Like Us

Artificial Intelligence (AI), especially large language models like GPT-4, has made remarkable strides in reasoning tasks. However, a new study highlights the limitations in AI’s ability to truly understand abstract concepts, suggesting that it’s more about mimicking patterns than genuine comprehension.

In the study by AI and language experts Martha Lewis from the University of Amsterdam and Melanie Mitchell from the Santa Fe Institute, GPT-4's ability to handle analogical reasoning was tested against human performance. Analogical reasoning—the ability to draw comparisons between different things based on shared similarities—is a crucial method humans use to understand the world. For example: "Cup is to coffee as soup is to ???" (Answer: bowl).

While GPT-4 excels in analogy tests, the study found it struggled when problems were slightly altered. Unlike humans, who maintained performance despite variations in analogy problems, GPT-4’s results dropped.

GPT's Shortcomings in Reasoning

The study tested AI and human performance on three types of analogy problems:

  1. Letter Sequences

  2. Digit Matrices

  3. Story Analogies

AI models like GPT-4 performed well on standard tests, but when faced with modified versions—such as changes in the position of a missing number or slight rewording of a story—GPT-4's performance faltered. Humans, however, remained consistent across the modifications. This suggests that GPT models lack the flexibility of human reasoning and often rely on pattern recognition rather than true understanding.

The Challenge for AI in Decision-Making

This research reveals that AI models like GPT-4 do not truly "understand" the analogies they generate. Their reasoning often mimics patterns seen in training data rather than abstract comprehension, which is a key feature of human cognition. The study concludes that GPT models are weaker than human cognition, especially when tasked with complex reasoning, pointing to the limitations of AI in fields requiring critical decision-making, such as healthcare, law, and education.

This is a critical reminder that while AI can be a powerful tool, it is not yet capable of replacing human thinking in complex, nuanced scenarios.

Article Details:
Martha Lewis and Melanie Mitchell, 2025, ‘Evaluating the Robustness of Analogical Reasoning in Large Language Models’
Transactions on Machine Learning Research

 
Published by the UvA