Q&A with the Experts: Why ChatGPT struggles with math | Waterloo Data and Artificial Intelligence Institute

Why AI still finds it challenging to generalize beyond the data they have been trained on

Media Relations

Yuntian Deng, an assistant professor at the David R. Cheriton School of Computer Science, highlights the limitations of large language models (LLMs) like OpenAI’s ChatGPT when performing mathematical tasks. Specifically, Deng points out that while ChatGPT’s latest version (o1) has improved over previous models, it still struggles with large-digit multiplication, particularly beyond nine digits. This limitation reveals a significant flaw in the reasoning abilities of LLMs. Unlike humans, who can generalize principles learned in one scenario to new, unfamiliar situations, LLMs often fail to extend their training to novel tasks, such as multiplying larger numbers. This raises concerns about the reliability of LLMs in tasks requiring deep reasoning. Deng stresses the importance of studying how these models "think," especially since companies like OpenAI have not fully disclosed their training methods. Understanding the strengths and weaknesses of AI tools helps guide their use in appropriate tasks while emphasizing where human expertise remains essential.

To read the full article, click here!