AI is learning to play Diplomacy, and it’s pretty good at it
AI is learning to play Diplomacy: Tech company DeepMind is training AI systems to play the strategy board game Diplomacy, which requires a mixture of competition and collaboration to win, according to Spectrum.
Turns out diplomacy is much more complex than chess: Whereas AI has previously been successfully trained to play purely competitive games such as chess and the Chinese board game Go, developing software that can manage a seven player game where each player moves simultaneously is “a qualitatively different problem,” according to DeepMind computer scientist Andrea Tachetti. While in the full version of Diplomacy players can verbally negotiate, DeepMind is focused on training AI to play the “no press” variant of the game that disables explicit communication.
How does it work? DeepMind refined the neural network DipNet, which had been fed data from 150k human games, by using a tweaked reinforcement learning algorithm. Instead of
using pure reinforcement learning (which would require the AI to work through a ridiculous number of possible moves each turn) programmers calculated the move that works best on average against likely moves of opponents, and trained the AI to prefer those moves, letting it work from its learning without sampling after training.
DeepMind’s AlphaGo became the first computer program to defeat a Go world champion in 2016, with the championship becoming the subject of a documentary film of the same name (watch, runtime 1:30:27). Since then, DeepMind has developed the more complex AlphaZero, which can also play chess and shogi, followed by MuZero, which learns games without being taught the rules and can also play visually complex Atari games. Humans have been losing games to Chess AIs since the 1980s, most notably with the 1997 defeat of world champion Garry Kasparov at the (hands? Wires?) of IBM supercomputer Deep Blue.
Facebook is set to join the game, and will present its own work on no-press Diplomacy in April, using a human-imitating network similar to DipNet. Instead of using reinforcement learning, Facebook enhanced the AI with a search function that allows the bot to more closely imitate human reasoning. The SeachBot takes its time to reason through the best strategy by calculating each opponent’s equilibrium strategy, through considering a series of probabilities across 50 likely moves. This slows the SearchBot down in real gameplay, but allows it to win more games.
Who is winning? While DeepMind appears to be a better player than DipNet (with one DeepMind AI playing against six DipNet players winning 30% of the time compared to DipNet winning only 3% of the time when the roles are reversed), SearchBot seems to be a better player than DeepMind, beating DipNet by a greater margin. SearchBot was also able to rise to the top 2% of players when competing against humans on a Diplomacy gaming website, and won 94% of the time when playing a set of 35 games against two top human Diplomacy players. We’re still waiting for a faceoff match between DeepMind and SearchBot.
AI learning to play collaborative games is cool, but also has real world implications: Games that require collaboration and not just pure competition are a lot more similar to real life situations, where a number of factors need to be weighed out before a decision can be made, DeepMind scientists say. The creation of AI bots that can comprehend multiple variables, map out possible scenarios, and come up with a strategy in response has implications for policy, healthcare, governance, and beyond.