An AI has learned to deceive human opponents in the war-themed board game Stratego, which involves imperfect information and a huge number of possible game scenarios
An AI can defeat expert human players in the board game Stratego, which has more possible game scenarios than chess, Go or poker.
The AI developed by the UK-based company DeepMind became one of the top-ranked online players of the Napoleonic-themed board game Stratego by learning to bluff with weaker pieces and sacrifice important pieces for the sake of victory.
“To us the most surprising behaviour was [the AI’s] ability to sacrifice valuable pieces to gain information about the opponent’s set-up and strategy,” says Julien Perolat at DeepMind.
The game of Stratego involves two players trying to capture the opponent’s flag hidden among an array of 40 game pieces. Most pieces consist of soldiers numbered from one to 10, with the higher-ranked soldiers defeating lower-ranked soldiers during encounters on the board. But players cannot see the identities of opponent game pieces unless two pieces from opposing armies encounter one another – unlike games such as chess or Go where both players can see everything.
Complicating this challenge is the fact that Stratego is an enormously complex game with 10535 possible game situations. By comparison, the game of Go has 10360 possible game states. Chess and poker have even less.
Perolat and his colleagues at DeepMind developed their “DeepNash” AI to conquer Stratego by playing itself over the course of 5.5 billion games with a simulation training time roughly equivalent to hundreds of years. But the AI didn’t rely on any knowledge of human strategies specific to the game, as was the case for DeepMind’s StarCraft-playing AI. Nor did it train to play against specific opponents.
Instead of trying to play by searching all the possible game scenarios, which would be computationally impossible, the DeepNash AI has an algorithm that continually steers its behaviour toward an optimal strategy informed by economic game theory, says Karl Tuyls at DeepMind. The optimal strategy is one that would guarantee at least a 50 per cent win rate against a perfect opponent, even if the opponent knew exactly what the AI planned to do.
The result is an AI capable of making winning decisions despite hidden information about its opponents, a huge number of possible game states and many different possible actions that can be taken during each turn. “This is a new thing that we couldn’t really do before,” says Julian Togelius at New York University.
DeepNash has already dominated both human and AI adversaries. It achieved an 84 per cent win rate during 50 ranked matches against expert human players through an online games platform and became one of the top three players – without human opponents realising they were playing an AI.
The DeepMind AI also notched a 97 per cent win rate against top Stratego-playing bots, including several that had previously won the Computer Stratego World Championship.
“Good players tend to memorise the opponent’s pieces and predict their deployment patterns,” says Georgios Yannakakis at the University of Malta. “DeepNash does both well – likely with a competitive advantage with regards to memory – and plays in interesting and unpredictable manners, showcasing elements of bluffing.”
The DeepNash game theory approach could prove useful in non-game situations where AIs must deal with other intelligent actors, such as in business and defence, says Tuomas Sandholm at Carnegie Mellon University in Pennsylvania.
Journal reference: Science, DOI: 10.1126/science.add4679
More on these topics: