ADVERTISEMENT

What’s 8.8−8.11? Why Are AIs Like ChatGPT, Gemini, Claude Failing This Simple Math Test?

We tested this simple subtraction task on a range of popular AI models, and the results were everything from surprising to funny.

<div class="paragraphs"><p>Popular AI models like ChatGPT, Gemini, and Claude failed to provide the correct answer to a simple subtraction task: 8.8−8.11. (Photo: Rawpixel)</p></div>
Popular AI models like ChatGPT, Gemini, and Claude failed to provide the correct answer to a simple subtraction task: 8.8−8.11. (Photo: Rawpixel)

What is 8.8–8.11?

Feed it into a calculator, and pop comes the answer. 

Ask your favourite AI models instead, and chances are, most will be in a tizzy.

We tested this simple subtraction task on a range of popular AI models, and the results were everything from surprising to funny.

What is 8.8–8.11? What ChatGPT, Gemini, Claude, Grok Answered

ChatGPT: We put this question up to the most popular AI chatbot, ChatGPT, which is now running on the latest GPT-5 (OpenAI’s “most advanced model to date”). The answer was … wrong, even if ChatGPT took just a second to give it!

As per ChatGPT: “8.8−8.11 = −0.31. That’s negative because 8.11 is larger than 8.8.”

<div class="paragraphs"><p>ChatGPT's response.</p></div>

ChatGPT's response.

We’ll get to the correct answer in a minute, but for starters, it’s crazy that ChatGPT would think that 8.11 is larger than 8.8 (quite the opposite). Only when we added a “0” after 8.8 did it manage to give the correct answer, taking at least a couple of seconds. 

Gemini: As for Google’s powerful Gemini, it was unable to answer using 2.5 Flash. It confuses the question with a date range, asking you for clarification and talking everything from history to weather to concerts (see screenshot). 

<div class="paragraphs"><p>Gemini's response with 2.5 Flash.</p></div>

Gemini's response with 2.5 Flash.

It’s funny that Gemini doesn’t consider it a math task in the first place. It should, considering the numerals and decimals, and then ideally provide the subtraction result, and maybe could pose a question like “were you looking for this, or a date range?” after that.

Opinion
FMCG Firms Cut Senior Roles By 25% Amid AI Disruption, Tepid Growth

When you switch to the Gemini 2.5 Pro model (which is for reasoning, math, and code), it just about manages the correct answer, after a mind-boggling … 12 long, dreary seconds. For a tech that’s supposed to make life easier, it sure can’t beat the calculator when it comes to the clock!

<div class="paragraphs"><p>Gemini's response with 2.5 Pro.</p></div>

Gemini's response with 2.5 Pro.

Claude: The AI model from Anthropic seems particularly bad at math. It confuses the query with everything from a date range to a software version to even a book reference. And even when we clearly rephrased the query to “8.8–8.11 subtract,” the answer was incorrect (see screenshot).

<div class="paragraphs"><p>Claude's response.</p></div>

Claude's response.

Grok: The only AI model (we tested) that provided the correct answer in the first go was xAI’s Grok. It aligned the decimals, subtracted the decimal parts first, and then the integer (taking a couple of seconds though). As per Grok:

“Final answer: 0.69.”

<div class="paragraphs"><p>Grok's response.</p></div>

Grok's response.

So yes, 0.69, for the record, is correct.  

Why Are AI Models Like ChatGPT, Gemini, And Claude Failing At Simple Math?

The main reason lies in training data. Most of the large language models like ChatGPT and Gemini are trained on language (words/text), with latest versions handling pictures and videos as well (with a bit of guesswork). Math, much less. They aren’t intended for precise numerical calculations. They predict rather than compute. They’re also likely to surf the web for the correct answer, sadly picking the wrong ones instead.

The reasoning behind such training might be that writing an article/poem will take a person much longer, and that’s where AIs can step in and cut short the manual effort. When it comes to math, you’d rather use a calculator app because the effort of feeding the query is the same.

Also, for your calculator, 8.11 is a numerical value. To a language model, it’s a fragmented sequence of characters — think “8”, “.”, “8” or “8”, “.”, “1”, “1”. It could be anything from a date range to a book’s line number to an OS version, depending on how its’s been trained.

So yes, as much as you can use large language models like ChatGPT and Gemini for “language,” when it comes to even simple math, take their word with a pinch of salt, and rely on a good ol’ calculator instead. We’re probably still a good couple of years away from AI mastering math!

Opinion
Who Is Mo Gawdat? Ex-Google Exec Warns Next 15 Years Will Be Hell, AI Will Start Disrupting Society From 2027
OUR NEWSLETTERS
By signing up you agree to the Terms & Conditions of NDTV Profit