Over the last couple of weeks, I’ve been fascinated and inspired by an important milestone at the cutting edge of Artificial Intelligence: The first 11 games of StarCraft played between professional StarCraft players and AlphaStar, a team of AI StarCraft agents built by DeepMind, the team behind previous expert-defeating game-players AlphaGo and AlphaZero.
It started with this 5-minute teaser video put out by the DeepMind team a couple of weeks ago:
If you’re as hooked by that teaser as I was, you might enjoy the full demonstration video with famous StarCraft casters Artosis and RotterdaM:
Finally, if you’re more able to read than watch, I’d recommend DeepMind’s write-up. Spoilers coming ahead…
Here’s my summary for people who don’t have the time to watch the videos or read the write-up:
DeepMind spent the last two years building their “agents” to play StarCraft II, ramping up development in the last few months.
The “agents” they built trained in something they call the AlphaStar League, for 1-2 weeks in real time, which corresponded to hundreds of years of human playing time.
The League was seeded by agents built by imitation learning, mimicking top human players.
Later agents were adjusted and trained by reinforcement learning, with some additional side objectives encouraging them to pursue unusual strategies.
All of the games were played Protoss versus Protoss, on the same map, using a particular version of the game that’s only a couple months old.
One set of five agents, trained for one week, beat professional StarCraft player TLO 5-0. TLO is not a professional Protoss player, but still very good — he estimates that he’s in about the top 1% of players at Protoss.
A second set of five agents, trained for an additional week, beat TLO’s teammate and top 10 professional Protoss player MaNa 5-0.
The AlphaStar does not utilize superhuman APM (actions per minute), and its reaction time is fairly human-level as well.
Based on some of the games, the biggest concern was with the camera: The agents were able to look simultaneously at the entire map, and although they also argued that its attention was in effect fairly limited, some of the widely distributed micro-management in the MaNa games suggested a need for a change. So they created a newer version which was limited to an explicit camera view and trained it for a week.
This new agent lost narrowly to MaNa in the exhibition match, despite an internal level which they expected would be almost as strong as the previous agents that had been trained for a week.
These are exciting results, and constitute major progress in AI. StarCraft has several of the features of real life that previous milestone games like Go and chess lacked: It’s played in real-time, features thousands of moves per game and incomplete information.
Of course, there are also some important benchmark questions around the AI’s capabilities and how it compares with human capabilities on nearly every front. The Deep Mind team were smart to start with imitation learning in order to get the behavior in roughly the appropriate range, but there are still questions about how the AI’s options compare to those of a human with a mouse and keyboard.
One clear advantage was that the AI agents were consistently better at micro-management of units than the pros, an important skill in the game but which is perhaps more indicative of the precision afforded to its actions than higher cognitive processes. That precision might be impossible for humans to achieve without actually being reflected in fast play speed or short reaction times.
But while these sorts of details are definitely interesting, I find it even more inspiring to see how the DeepMind team presented their work. It starts, of course, with their professionalism and respect for MaNa, TLO, and the StarCraft II community throughout the project. The exhibition was quite professional, and on his own channel, MaNa was effusive about how well they treated him throughout the process, including the professionalism of the livestream, despite the glitches afterwards.
Their technical explanations were quite superb as well. It’s always insanely tempting when one hears about AI to resort to magic thinking and language. Just throw the words “deep learning” or “neural network” in there and viola, people think you can do anything!
To counter that, the DeepMind team chose to discuss their work in very human terms. I loved their choice to call the reinforcement learning process a “league” as it evoked analogous memories of how human players have improved their play over time. Even when it came to their complicated visualization with neural network, they included elements like the outcome prediction (whether it thought it would win or lose) which excited the casters to see.
I’m thinking about these developments both recreationally as fan of DeepMind’s work and professionally now as a machine learning scientist at Kebotix, which was just named one of the top 100 AI startups this week by CB Insights.
Our team is quite interdisciplinary, both in the sense that we come different backgrounds and that we each have multiple backgrounds to bring to the table. I actually never expected to use the chemistry double major that I got for fun from Caltech, but it’s been incredibly useful to be able to speak the language of chemistry as we seek to bring machine learning methods to materials and chemical discovery.
As I’ve sought to explain and present what our various ML algorithms are doing, I’ve had to think carefully about how I talk about them. Popular perception, startup pitches, and even much of the prior academic work tends to hype up the possibilities or make machine learning seem like a magic wand that makes everything better.
When explaining what we can and need to do to my colleagues, I always tend to come back to the analogy of human intelligence. Our algorithms, depending on the task they’re asked to solve, are like new grad students, undergrads, or high school students learning chemistry. We need to teach them, which often means giving them data and/or detailed instructions. They’re also inexperienced, so you need to hold their hand, at least initially.
Obviously, the analogy isn’t perfect. Algorithms are much better than humans at aggregating large quantities of data, and executing commands precisely. But conversely, they can’t do much with small amounts of data or poorly communicated instructions.
One of the big advantages of games in AI is that they theoretically provide a data source that’s only limited by your computational power. Agents in the AlphaStar League could keep getting feedback on their strategies simply by playing more games. Set them up correctly, which of course is the hard part, and they can learn from hundreds of years of practice playing the same version of StarCraft on the same map.
But even that learning process looked fairly recognizably human. From the detailed write-up:
Early on in the AlphaStar league, “cheesy” strategies such as very quick rushes with Photon Cannons or Dark Templars were favoured. These risky strategies were discarded as training progressed, leading to other strategies: for example, gaining economic strength by over-extending a base with more workers, or sacrificing two Oracles to disrupt an opponent’s workers and economy. This process is similar to the way in which players have discovered new strategies, and were able to defeat previously favoured approaches, over the years since StarCraft was released.
Even if the process by which AI learns to play Starcraft or do chemistry largely resembles the way humans do it, there are huge advantages to getting to that point. At the very least, we can learn from each other, just like how MaNa adopted AlphaStar’s probe oversaturation build in the exhibition game.
The promise of course extends much further than that. Unlike grad students, the software components of AI are fairly trivial to reproduce and apply to a wide variety of problems. And if it’s better than grad students, well, if grad students I’ve met in any department wish they could be replaced by robots, it’d be the chemists.
Of course, you can’t get into any sort of discussion comparing AI and human behavior without dealing with the elephant in the room of AI safety. Sure, we don’t have to worry about AlphaStar itself ravaging the streets of London with its killer micro, but we have to keep in mind that DeepMind’s ultimate objective is to build Artificial General Intelligence, or AGI, an agent that can do everything that human minds can do.
Hazarding guesses as to the future of AI research is always, well, hazardous, but it’s a time-honored tradition when it comes to DeepMind. In this vein, it seems to me that should superior AI eventually develop, it will actually be fairly recognizable in the way it learns.
And like humans, it will also probably develop its intelligence and skills in particular contexts first. Not to as quite the narrow extent of being only able to play a specific board of StarCraft, but to the extent where its generality will be a major question mark as to whether it counts as AGI.
Finally, like these StarCraft agents, we will probably see a significant period of time when AGI is recognizably near-top-human-level without becoming immediately impossibly stronger. This isn’t to say that we won’t see an intelligence explosion, or as it’s known in the field, a Singularity, after that point, but that the lead-up to human-level intelligence could very well involve fairly gradual improvements.
That said, taking the longer view, “AI will develop gradually” might not be the most appropriate conclusion here. After all, another major milestone in AI development has now been passed, and it’s going to be an exciting ride from here. What do you think will happen next?