23 March 2025

Why GPT Can’t Think Like Us

Artificial Intelligence (AI), especially large language models like GPT-4, has made remarkable strides in reasoning tasks. However, a new study highlights the limitations in AI’s ability to truly understand abstract concepts, suggesting that it’s more about mimicking patterns than genuine comprehension.

In the study by AI and language experts Martha Lewis from the University of Amsterdam and Melanie Mitchell from the Santa Fe Institute, GPT-4's ability to handle analogical reasoning was tested against human performance. Analogical reasoning—the ability to draw comparisons between different things based on shared similarities—is a crucial method humans use to understand the world. For example: "Cup is to coffee as soup is to ???" (Answer: bowl).

While GPT-4 excels in analogy tests, the study found it struggled when problems were slightly altered. Unlike humans, who maintained performance despite variations in analogy problems, GPT-4’s results dropped.

GPT's Shortcomings in Reasoning

The study tested AI and human performance on three types of analogy problems:

Letter Sequences
Digit Matrices
Story Analogies

AI models like GPT-4 performed well on standard tests, but when faced with modified versions—such as changes in the position of a missing number or slight rewording of a story—GPT-4's performance faltered. Humans, however, remained consistent across the modifications. This suggests that GPT models lack the flexibility of human reasoning and often rely on pattern recognition rather than true understanding.

The Challenge for AI in Decision-Making

This research reveals that AI models like GPT-4 do not truly "understand" the analogies they generate. Their reasoning often mimics patterns seen in training data rather than abstract comprehension, which is a key feature of human cognition. The study concludes that GPT models are weaker than human cognition, especially when tasked with complex reasoning, pointing to the limitations of AI in fields requiring critical decision-making, such as healthcare, law, and education.

This is a critical reminder that while AI can be a powerful tool, it is not yet capable of replacing human thinking in complex, nuanced scenarios.

Article Details:
Martha Lewis and Melanie Mitchell, 2025, ‘Evaluating the Robustness of Analogical Reasoning in Large Language Models’
Transactions on Machine Learning Research

Published by the UvA

Vergelijkbaar >

Similar news items

>View all news items >

23 March 2025

Why GPT Can’t Think Like Us >

Call for Participants: Help Test a New AI Powered Recommendation System for NEMO Kennislink! >

NEMO Kennislink, in collaboration with researchers from UvA, HvA, and CWI, is testing a new AI-powered recommendation system in the AI Media & Democracy Lab. We’re looking for enthusiastic participants to read NEMO Kennislink articles on their smartphone via a special test app for two months.

Lemni secures €3.3M pre-seed funding for AI-driven customer interactions >

Amsterdam-based AI startup Lemni has officially launched, securing €3.3 million in pre-seed funding led by Sequoia Capital. The investment will help Lemni accelerate product development, expand its team, and scale its AI-driven customer interaction platform globally.