Grok Outsmarts The Field: Mensa Scores Put Elon Musk's AI Ahead Of GPT And Gemini
In the rapidly shifting hierarchy of AI intelligence benchmarks, Grok has staged one of the most intriguing climbs of the year. In the latest IQ test by AItracking.org, where Grok-4 Expert Mode dominated the leaderboard.
At the very top of the rankings, Grok-4 Expert Mode plants its flag with a 126 score offline and a 136 on Mensa Norway. The Vision variant also delivers a punch with 74 offline and 96 on Mensa Norway, showcasing how multimodal capability, while difficult, is steadily improving.
The black bar tracks the Mensa Norway test, and the orange captures an offline IQ-style evaluation.
Just below Grok sits Google’s Gemini 3 Pro Preview, edging ahead of many rivals with 123 offline and a remarkable 142 Mensa score — the closest challenger in the race.
OpenAI’s new GPT-5.1 family follows closely behind, with the Pro version crossing 118 offline and 144 Mensa, while the Thinking and Vision models show impressive consistency in the 95–120 band.
Claude, often praised for reasoning, holds strong with its 4.5 Opus and Sonnet models, clocking between 114–124 offline and 121–124 Mensa, but this time it’s Grok that clearly steals the spotlight.
Further down the leaderboard, the numbers tighten. Models like Perplexity, DeepSeek R1, Mistral Medium 3.1, and Llama 4 Maverick cluster around the 88–99 offline and 95–107 Mensa range: strong performers, but a tier below the leaders.
And at the base of the table sit earlier-generation models like GPT-4o, Grok-4.1 Beta, and Llama 4 Vision, mostly in the 60-70 offline and 67-96 Mensa window, reminders of just how fast the frontier is moving.