Google’s latest AI agent is learning how to navigate a familiar space: gaming.
The tech giant released new research on its Scalable Instructable Multiworld Agent, or SIMA, on Wednesday. This agent can follow instructions to carry out tasks in video games — and play games it has never seen before.
But, like Genie, which DeepMind, Google’s AI research arm, discussed in a research paper published Feb. 23, SIMA is a research project.
“We could in the future have agents like SIMA playing alongside you,” said Tim Harley, a research engineer at DeepMind who co-led the project. “Agents that are cooperative that you can talk to and instruct to do various things in the game with you on the fly.”
DeepMind says its interest in video games is in part because they are a good training ground for AI systems. The AI company hopes research like this enables it to “understand how AI systems may become more helpful.”
Since OpenAI released ChatGPT in November 2022, the marketplace has been flooded with generative AI tools from Microsoft, Google, Adobe, Meta and Anthropic. More recently, generative AI has expanded beyond writing to include imagery, video, music and, of course, gaming as tech companies seek to distinguish their offerings in the burgeoning space.
Research goals
According to Harley, SIMA is trained to do as it is told, which doesn’t necessarily mean winning.
The researchers’ main questions at the outset were if an AI agent could transfer skills between games and how it would behave in game it has never played before.
“Those goals come in open-ended freeform natural language from some human user and then [SIMA] acts in these video game environments just using the natural interface to the game,” Harley said. “And the only way the agent can observe these games is just from the screen in real time.”
Training
Researchers recorded images and the keyboard and mouse inputs of human players and used imitation learning techniques to teach SIMA to play games like No Man’s Sky, Eco, Teardown and Goat Simulator like humans would.
They evaluated the agent on 600 skills, including navigation (like “turn left”), object interaction (“climb the ladder”) and menu use (“open the map”) and found SIMA performs better than game specialists.
“He’s able to take advantage of the shared concepts between games, to learn better skills and to learn to be better at carrying out those instructions,” said Frederic Besse, research engineer, DeepMind. “Seeing positive transfer between games is a key milestone for research.”
But SIMA isn’t perfect.
“All the errors we see are around more of the fine-grained understanding,” Harley said. “So if we ask an agent to chop down a tree in the game of Valheim, it’ll go and chop down a tree, but we can’t precisely specify which one.”
He’s reluctant to call SIMA’s imperfections “hallucinations.”
“Often what we see when the agent fails … I wouldn’t call them hallucinations, its behavior does look intentional a lot of the time, but it fails to execute the necessary behavior,” he added.
‘A great training ground’
From here, DeepMind hopes to improve SIMA’s performance, including making its agents able to follow more detailed instructions, and to ultimately develop AI systems “that can act in as many environments as possible and achieve a variety of goals as well as converse with the user,” Besse said.
But it’s not just about human-agent communication in gaming.
“We believe that games and simulation in general provide a great training ground for AI systems,” Besse said.
That’s in part because games are an approximation of the real world. They have visual diversity, along with diverse settings, mechanics and graphical styles. But they also share common themes, like navigating complicated spaces and interacting with objects, characters and players.