Q&A with the Experts: Why ChatGPT struggles with math
Why AI still finds it challenging to generalize beyond the data they have been trained on
Why AI still finds it challenging to generalize beyond the data they have been trained on
By Media RelationsHave you ever tried to use an AI tool like ChatGPT to do some math and found it doesn’t always add up? It turns out there’s a reason for that.
As large language models (LLMs) like OpenAI's ChatGPT become more ubiquitous, people increasingly rely on them for work and research assistance. Yuntian Deng, assistant professor at the David R. Cheriton School of Computer Science, discusses some of the challenges in LLMs' reasoning capabilities, particularly in math, and explores the implications of using these models to aid problem-solving.
What flaw did you discover in ChatGPT’s ability to do math?
As I explained in a recent post on X, the latest reasoning variant of ChatGPT o1, struggles with large-digit multiplication, especially when multiplying numbers beyond nine digits. This is a notable improvement over the previous ChatGPT-4o model, which struggled even with four-digit multiplication, but it’s still a major flaw.
Read individual Tweet on Twitter
What implications does this have regarding the tool’s ability to reason?
Large-digit multiplication is a useful test of reasoning because it requires a model to apply principles learned during training to new test cases. Humans can do this naturally. For instance, if you teach a high school student how to multiply nine-digit numbers, they can easily extend that understanding to handle ten-digit multiplication, demonstrating a grasp of the underlying principles rather than mere memorization.
In contrast, LLMs often struggle to generalize beyond the data they have been trained on. For example, if an LLM is trained on data involving multiplication of up to nine-digit numbers, it typically cannot generalize to ten-digit multiplication.
As LLMs become more powerful, their impressive performance on challenging benchmarks can create the perception that they can "think" at advanced levels. It's tempting to rely on them to solve novel problems or even make decisions. However, the fact that even o1 struggles with reliably solving large-digit multiplication problems indicates that LLMs still face challenges when asked to generalize to new tasks or unfamiliar domains.
Why is it important to study how these LLMs think?
Companies like OpenAI haven't fully disclosed the details of how their models are trained or the data they use. Understanding how these AI models operate allows researchers to identify their strengths and limitations, which is essential for improving them. Moreover, knowing these limitations helps us understand which tasks are best suited for LLMs and where human expertise is still crucial.
This series is produced for the media, and its purpose is to share the expertise of UWaterloo researchers. To reach this researcher, please contact Media Relations.
CPI brings together leading experts to discuss open banking, election security, quantum technologies and societal surveillance
Cybersecurity and Privacy Institute collaborates with C.D. Howe Institute to host impactful cybersecurity policy conference in Toronto
The Royal Society of Canada welcomes Drs. Ihab Ilyas and Laura Hug as part of the latest cohort of fellows and RSC College members
The University of Waterloo acknowledges that much of our work takes place on the traditional territory of the Neutral, Anishinaabeg, and Haudenosaunee peoples. Our main campus is situated on the Haldimand Tract, the land granted to the Six Nations that includes six miles on each side of the Grand River. Our active work toward reconciliation takes place across our campuses through research, learning, teaching, and community building, and is co-ordinated within the Office of Indigenous Relations.