AI / Tech
Great news for xAI: Grok is now pretty good at answering questions about Baldur’s Gate
Different AI labs have different priorities. OpenAI has traditionally focused on consumer users, for instance, while its rival Anthropic tends to target enterprises. Elon Musk’s xAI, we discovered recently, has been placing particular emphasis on video-game walkthroughs.
On Friday, Business Insider’s Grace Kay published a detailed and far-reaching report about xAI, the AI startup recently acquired by SpaceX, with particular emphasis on how Musk is making life difficult for employees. But this particular anecdote stood out:
In one instance last year, a model release was delayed for several days because Musk was dissatisfied with how the chatbot answered detailed questions about the video game “Baldur’s Gate,” according to people familiar with the matter. High-level engineers were pulled from other projects to improve the responses before launch, they said.
Of course, you can imagine the frustration of any respected and experienced engineer who shows up to work thinking he’ll be tackling fundamental problems of knowledge and machine intelligence, only to be sidetracked into helping a 54-year-old man beat his video game. But the anecdote raises an even more pressing question: Did Musk end up getting the gaming skills he wanted?
To answer that question, our resident RPG enthusiast Ram Iyer put together a set of five general questions about Baldur’s Gate, which we ran against xAI and the three major models in a kind of quasi-benchmark that I’ve decided to call “BaldurBench.”
In the interest of journalistic transparency, I’ve made all the chat transcripts public, so you can see them here: Grok, ChatGPT, Claude, and Gemini.
First, the good news: Grok actually gives pretty good information. Its responses were a bit dense with gamer jargon — “save-scumming” instead of saving and “DPS” instead of damage — but the answers were both useful and well-informed, provided you knew what it was talking about. Grok also really loves tables and theorycraft, which is about what you would expect.
There are lots of Baldur’s Gate guides out there and the models were generally drawing from the same ones, so the biggest differences were stylistic. ChatGPT prefers bulleted lists and sentence fragments, while Gemini loves to bold important words.
Techcrunch event
Boston, MA
|
June 9, 2026
The biggest surprise was Claude, which was particularly concerned about giving me information that would spoil my experience of the game. When I asked about good party compositions, it closed the guidance by saying, “Don’t stress too much and just play what sounds fun to you.” Thanks, Claude!
It’s important to bear in mind, this is a subject area we know (thanks to Business Insider’s reporting) that xAI has specifically focused on reaching parity. So we shouldn’t read too much into the fact that, after the reported sprint, Grok’s advice turned out about the same as the other models. Still, it’s nice to know xAI can make it work if it tries.
