Claude Opus 4.5: Anthropic's Latest AI Model Beats Google Gemini 3, OpenAI's GPT 5.1 In This Key Metric

Claude Opus 4.5 has achieved an unprecedented score of 80.9% on the SWE-bench Verified test, a benchmark that evaluates real-world software engineering skills.

NDTV Profit Tech

25 Nov 2025, 06:27 PM IST i

Claude Opus 4.5: Anthropic's Latest AI Model Beats Google Gemini 3, OpenAI's GPT 5.1 In This Key Metric (Image: Freepik)

Show Quick Read

Summary is AI Generated. Newsroom Reviewed

Anthropic announced the release of Claude Opus 4.5, its latest AI model designed for enhanced coding, computer use and handling of complex tasks for enterprise users. The startup emphasises the model’s strength in managing sophisticated workplace challenges.

"It's intelligent, efficient and the best model in the world for coding, agents and computer use," Anthropic said in a blog post. "It's also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets."

Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

Anthropic

The company had rolled out Claude Sonnet 4.5 in late September and followed up with Claude Haiku 4.5 in October.

Claude Opus 4.5 has achieved an unprecedented score of 80.9% on the SWE-bench Verified test, a benchmark that evaluates real-world software engineering skills. This milestone makes it the first model to surpass the 80% threshold. In comparison, Google's Gemini 3 Pro scored 76.2%, while OpenAI’s GPT-5.1 Codex Max achieved 77.9%.

The new model outperformed all human applicants on Anthropic's challenging two-hour engineering assessment, which evaluates practical coding and problem-solving abilities.

"The take-home test is designed to assess technical ability and judgment under time pressure. It doesn’t test for other crucial skills candidates may possess, like collaboration, communication, or the instincts that develop over the years. But this result — where an AI model outperforms strong candidates on important technical skills — raises questions about how AI will change engineering as a profession," the company said.

Anthropic claims that its latest AI model outperforms competitors on the Tau2-bench, a benchmark designed to evaluate agents handling real-world, multi-turn tasks. In one test, the model acts as an airline service representative, correctly refusing a change to a basic economy booking when airline policies prohibit such modifications.

“Instead, Opus 4.5 found an insightful (and legitimate) way to solve the problem: upgrade the cabin first, then modify the flights,” Anthropic said.

Designed for reliable long-form content creation, Opus 4.5 can generate narrative chapters spanning 10 to 15 pages while maintaining consistency. It excels in sophisticated 3D reasoning exercises, offering richer and more precise spatial scene descriptions than before, the company said.

Claude Opus 4.5: Anthropic's Latest AI Model Beats Google Gemini 3, OpenAI's GPT 5.1 In This Key Metric

Claude Opus 4.5 has achieved an unprecedented score of 80.9% on the SWE-bench Verified test, a benchmark that evaluates real-world software engineering skills.

ALSO READ

Elon Musk Teases Grok 5, Challenges Top League Of Legends Gamers For Match In 2026

NDTV Profit

NDTV Profit

Follow Us

DISCOVER US FASTER

DOWNLOAD THE APP