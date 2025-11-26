The black bar tracks the Mensa Norway test, and the orange captures an offline IQ-style evaluation.

Just below Grok sits Google’s Gemini 3 Pro Preview, edging ahead of many rivals with 123 offline and a remarkable 142 Mensa score — the closest challenger in the race.

OpenAI’s new GPT-5.1 family follows closely behind, with the Pro version crossing 118 offline and 144 Mensa, while the Thinking and Vision models show impressive consistency in the 95–120 band.

Claude, often praised for reasoning, holds strong with its 4.5 Opus and Sonnet models, clocking between 114–124 offline and 121–124 Mensa, but this time it’s Grok that clearly steals the spotlight.

Further down the leaderboard, the numbers tighten. Models like Perplexity, DeepSeek R1, Mistral Medium 3.1, and Llama 4 Maverick cluster around the 88–99 offline and 95–107 Mensa range: strong performers, but a tier below the leaders.

And at the base of the table sit earlier-generation models like GPT-4o, Grok-4.1 Beta, and Llama 4 Vision, mostly in the 60-70 offline and 67-96 Mensa window, reminders of just how fast the frontier is moving.