OpenAI has acknowledged an unusual issue with its AI tools, which had begun referencing goblins in their responses seemingly at random. The AI company's response followed within 24 hours of reports that it had imposed an explicit restriction on its Codex AI tool, preventing it from mentioning such creatures.
According to Wired, OpenAI's instructions are designed to regulate its coding output, including multiple reminders not to insert random references to a variety of creatures, spanning both folklore and the real world.
“Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query,” read instructions in Codex CLI, OpenAI's coding agent.
ALSO READ: AI Emerges As Key Factor For Smartphone Purchase, With 89% Users Influenced By It: Report
According to the blog post released on Thursday, the company identified a pattern in which ChatGPT and other GPT-5-powered tools were increasingly drawing on imagery involving goblins and gremlins when crafting metaphors.
“Starting with GPT‑5.1, our models began developing a strange habit: they increasingly mentioned goblins, gremlins, and other creatures in their metaphors... A single ‘little goblin' in an answer could be harmless, even charming. Across model generations, though, the habit became hard to miss: the goblins kept multiplying, and we needed to figure out where they came from,” OpenAI wrote.
OpenAI traced the unusual fixation on goblins to a short-lived “Nerdy personality” setting once available in ChatGPT.
In building that mode, the company said it had encouraged the model to lean into imaginative, mythology-inspired metaphors. However, even after the feature was withdrawn, the system appeared to retain a lingering preference for references to goblins, gremlins and other fictional beings.
“The short answer is that model behaviour is shaped by many small incentives. In this case, one of those incentives came from training the model for the personality customisation feature, in particular, the Nerdy personality. We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread,” OpenAI said.
The company noted that the pattern first stood out clearly after GPT-5.1 went live in November, although its origins could date back further. User feedback highlighting an oddly casual, overfriendly tone prompted an internal probe into the model's linguistic habits.
A safety researcher, having encountered several mentions of “goblins” and “gremlins”, recommended including them in the analysis. The results revealed a 175% jump in the use of “goblin” and a 52% increase in “gremlin” following the launch.
“We retired the ‘Nerdy' personality in March after launching GPT‑5.4. In training, we removed the goblin-affine reward signal and filtered training data containing creature-words, making goblins less likely to over-appear or show up in inappropriate contexts,” OpenAI wrote.
Essential Business Intelligence, Continuous LIVE TV, Sharp Market Insights, Practical Personal Finance Advice and Latest Stories — On NDTV Profit.
