Google DeepMind Taught Itself to Play Minecraft

An artificial intelligence (AI) system has for the first time figured out how to collect diamonds in the hugely popular video game Minecraft—a difficult task requiring multiple steps—without being shown how to play. Its creators say the system, called Dreamer, is a step towards machines that can generalize knowledge learn in one domain to new situations, a major goal of AI.

“Dreamer marks a significant step towards general AI systems,” says Danijar Hafner, a computer scientist at Google DeepMind in San Francisco, California. “It allows AI to understand its physical environment and also to self-improve over time, without a human having to tell it exactly what to do.” Hafner and his colleagues describe Dreamer in a study in Nature published on 2 April.

In Minecraft, players explore a virtual 3D world containing a variety of terrains, including forests, mountains, deserts and swamps. Players use the world’s resources to create objects, such as chests, fences and swords—and collect items, among the most prized of which are diamonds.

On supporting science journalism

If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Importantly, says Hafner, no two experiences are the same. “Every time you play Minecraft, it’s a new, randomly generated world,” he says. This makes it useful for challenging an AI system that researchers want to be able to generalize from one situation to the next. “You have to really understand what’s in front of you; you can’t just memorize a specific strategy,” he says.

Collecting a diamond is “a very hard task,” says computer scientist Jeff Clune at the University of British Columbia in Vancouver, Canada, who was part of a separate team that trained a program to find diamonds using videos of human play. “There is no question this represents a major step forward for the field.”

Diamonds are forever

AI researchers have focused on finding diamonds, says Hafner, because it requires a series of complicated steps, including finding trees and breaking them down to gather wood, which players can use to build a crafting table.

This, together with more wood, can be used to make a wooden pickaxe—and so on, until players have assembled the correct tools to collect a diamond, which is buried deep underground. “There’s a long chain of these milestones, and so, it requires very deep exploration,” he says.

Previous attempts to get AI systems to collect diamonds relied on using videos of human play or researchers leading systems through the steps.

By contrast, Dreamer explores everything about the game on its own, using a trial-and-error technique called reinforcement learning—it identifies actions that are likely to beget rewards, repeats them and discards others. Reinforcement learning underpins some major advances in AI. But previous programs were specialists—they could not apply knowledge in new domains from scratch.

Build me a world model

Key to Dreamer’s success, says Hafner, is that it builds a model of its surroundings and uses this ‘world model’ to ‘imagine’ future scenarios and guide decision-making. Rather like our own abstract thoughts, the world model is not an exact replica of its surroundings. But it allows the Dreamer agent to try things out and predict the potential rewards of different actions using less computation than would be needed to complete those actions in Minecraft. “The world model really equips the AI system with the ability to imagine the future,” says Hafner.

This ability could also help to create robots that can learn to interact in the real world—where the costs of trial and error are much higher than in a video game, says Hafner.

Testing Dreamer on the diamond challenge was an afterthought. “We built this whole algorithm without that in mind,” says Hafner. But it occurred to the team that it was the ideal way to test whether its algorithm could work, out of the box, on an unfamiliar task.

In Minecraft, the team used a protocol that gave Dreamer a ‘plus one’ reward every time it completed one of 12 progressive steps involved in diamond collection—including creating planks and a furnace, mining iron and forging an iron pickaxe.

These intermediate rewards prompted Dreamer to select actions that were more likely to lead to a diamond. The team reset the game every 30 minutes so that Dreamer did not become accustomed to one particular configuration—but rather learnt general rules for gaining rewards.

Under this set-up, it takes around nine days of continuous play for Dreamer to find at least one diamond, says Hafner. Expert human players will take 20–30 minutes to find a diamond, whereas novices take longer.

“This paper is about training a single algorithm to perform well across diverse reinforcement-learning tasks,” says computer scientist Keyon Vafa at Harvard University in Boston, Massachusetts. “This is a notoriously hard problem and the results are fantastic.”

An even bigger target for AI, says Clune, is the ultimate challenge for Minecraft players: killing the Ender Dragon, the virtual world’s most fearsome creature.

This article is reproduced with permission and was first published on April 2, 2025.

Source link

On supporting science journalism

Diamonds are forever

Build me a world model

Start typing and press enter to search