Deepline | It starts with AlphaGo: How move 37 on board predicts future of AI

Deepline

2026.03.12 14:10

People talking about AI today are focused on new things: bigger models, longer contexts, more human-like responses.

However, in a podcast on March 11, 2026, Google DeepMind discussed the match from 10 years ago when AlphaGo defeated Lee Sedol. They called that moment the "key pivot point" of AI.

Why that match?

In that game, there was one move that changed people's understanding of AI. It made us realize for the first time: AI isn't just about learning from humans; it might forge paths that humans have never taken. And this ability, once it leaves the chessboard, could change more than just Go.

To understand why that move was so important, we first need to go back to a question from that time: Why was Go long considered one of the most difficult challenges for AI?

In the podcast, Thore Graepel, a core architect of AlphaGo, recalled that Go was seen by AI researchers as almost the perfect challenge.

The reason isn't complicated: the rules of the Go game are very simple, but once the game begins, the situation rapidly becomes extremely complex. Every seemingly ordinary stone placed on the board could have a cascading effect dozens of moves later. And these effects are often extremely difficult to foresee.

When comparing board games, many think of chess. Back in 1997, IBM's Deep Blue defeated world champion Garry Kasparov. At the time, many thought machines would soon achieve a similar breakthrough in Go.

But the outcome for Go was completely different.

From a computational perspective, the complexity of Go far exceeds that of chess. In chess, a game typically requires considering about sixty or seventy moves, whereas a Go game often lasts two to three hundred moves, with a vast number of possible placements at each step. This means the number of possible variations in a game grows exponentially, exceeding the limits of what traditional computational methods can exhaustively calculate.

Pushmeet Kohli, Head of AI for Science at DeepMind, explained this difference in the podcast. The difficulty of Go isn't just the multitude of possible moves; more critically, the game is long, requiring continuous reasoning through layers of variation.

For a machine, this means needing to find a reasonable path within an unimaginably vast space.

When human players face this complexity, they have their own way of coping. They don't calculate all possible variations; instead, they rely on experience and intuition to first filter out directions that "seem promising," and then conduct further analysis.

The problem was that early AI didn't possess this ability.

Traditional AI methods relied on massive computation, trying different moves repeatedly to find better outcomes, but they quickly hit a bottleneck with a complex problem like Go. So for a long time, Go was considered a major hurdle in the AI field, because it tested not only computational power but also the kind of intuition humans possess.

When DeepMind began researching Go, they attempted to combine these two ways of thinking. On one hand, they used deep learning to learn which move directions "seemed promising" in a game state. On the other hand, they used computational methods to simulate possible subsequent variations.

In other words, the machine needed to be able to quickly identify the general direction, as well as delve deeper into analysis during critical situations. This approach gave researchers the first glimpse of a potential breakthrough.

If we only look at the match result, AlphaGo's 4-1 victory over Lee Sedol might be interpreted as a technological advancement.

But what people truly remember is move 37 in the second game. Professional player Michael Redmond, commenting at the time, initially thought there was a recording error. He picked up a stone and put it down again, because according to traditional Go theory, this move was not in a position a human player would seriously consider.

Later, when the DeepMind team recalled this moment, they mentioned a detail: according to statistics from historical human player game records within AlphaGo's model, the probability of such a move occurring was only one in ten thousand.

As the game continued, many placements that initially seemed unreasonable began to slowly reveal their effect. Dozens of moves later, people gradually realized this wasn't just a random attempt, but a strategy different from conventional thinking. It changed the balance of power distribution on the board and also shifted both players' understanding of the relationship between territory and influence.

Graepel recalled in the podcast that a professional player sitting next to him initially completely failed to understand the significance of this move, even saying he would normally tell his students not to play that way. But after the game ended, that player came back specifically to tell him it was the most unforgettable game he had ever seen, because the machine had used a completely new type of move.

This is the significance of move 37. This move wasn't directly learned from human game records; it was a new method formed during the exploration process. It proved one thing: a machine could surpass existing experience and find novel solutions.

The DeepMind team also began to think: what else might this capability be used for?

The answer came quickly.

Shortly after AlphaGo defeated Lee Sedol, the DeepMind team conducted an experiment that seemed simple but was quite bold at the time: they stopped using any human game records.

The machine no longer learned from professional games. It only knew two things: the rules of Go, and the criteria for winning or losing. Then the team let it continuously play against itself, gradually finding better strategies through repeated trial and error.

This was how AlphaZero worked. The machine initially knew almost nothing; it just kept playing games and constantly adjusting its strategy. But as the number of games increased, it gradually formed its own understanding: which moves had more potential, and which positions were more advantageous.

The DeepMind team observed that in the early stages of learning, the machine would slowly "rediscover" many classic Go strategies that had long existed. It would essentially try out almost all the knowledge humans had accumulated over centuries. But as it continued exploring further, it would then start to abandon some of them, because it found more effective ways.

Graepel said in the podcast that this is precisely what excited researchers most about AlphaZero: it could not just rediscover human knowledge, but also build upon it to find things humans hadn't thought of.

And this ability was foreseen by some during the Seoul match.

The film crew documenting AlphaGo was packing up their equipment, but the microphones were still on.

They inadvertently recorded a conversation between Demis Hassabis, CEO of Google DeepMind, and David Silver, Principal Research Scientist. Hassabis remarked that it was amazing to see this problem, once thought impossible, solved so quickly. Then he paused and said he was now sure they could tackle "protein folding."

And they did.

In biology, how proteins fold into three-dimensional structures has always been an extremely difficult problem. Scientists knew the amino acid sequence of a protein, but inferring its final spatial form often required years of experiments.

By learning from vast amounts of data and physical principles, AlphaFold produced predictions approaching experimental accuracy in the 2020 CASP14 competition. Many researchers later commented that this work significantly accelerated the pace of research in structural biology.

Similar things happened in mathematics and computing. Matrix multiplication is one of the most fundamental operations in computer science, yet for decades, people have found few more efficient algorithms.

DeepMind had its model continuously experiment among a vast number of possible computational steps, and it discovered some new algorithmic paths, some of which were more computationally efficient than the methods humans had used before. This is what AlphaTensor accomplished.

Another example is AlphaEvolve.

The research team applied this strategy exploration approach to engineering problems, such as optimizing resource allocation in data centers and improving logistics routes. In these scenarios, the machine would search for better arrangements among a vast number of possibilities, and some results surpassed the original designs of the engineers.

Behind all these breakthroughs is the same method: allowing machines to autonomously explore within environments governed by clear rules, finding paths that humans had overlooked. This method was first validated in the game of Go.

That's why DeepMind researchers often return to that match. When people ask why AI has made so much sudden progress in recent years, they frequently mention the move 37 on the Go board.

(Source: AI Deep Researcher, WeChat Public Platform)

Deepline | Is AI drama production already here? Unpacking viral and misleading 'Huo Qubing' story