Skip to main content

Deep Analysis of "The Divine Move"

On March 10, 2016, during Game 2 between AlphaGo and Lee Sedol, Move 37 saw AlphaGo play a "shoulder hit" on the fifth line in the upper right area.

This move came to be known as "The Divine Move." It not only helped AlphaGo win the game but also changed humanity's understanding of Go.

This article will deeply analyze this move from multiple perspectives: the game context, traditional Go theory, expert reactions, the AI perspective, and its long-term impact on Go theory.


Game Position Review

The Opening of Game 2

After losing Game 1, Lee Sedol made adjustments in Game 2. He chose to play White (second), hoping to observe AlphaGo's opening tendencies before formulating his strategy.

Opening phase:

  • Black 1: Star point in the upper right corner
  • White 2: Star point in the lower left corner
  • Black 3-White 4: Each side occupies a corner

Up to Move 36, the game developed normally. AlphaGo played Black and engaged in a local battle in the upper right corner. White (Lee Sedol) had built influence on the right side, while Black had some territorial potential on the top.

Position After Move 36

Let's look at the board state after Move 36:

ABCDEFGHJKLMNOPQRST
19
18
17
16+++
15
14White's influence
13
12
11
10+++
9
8
7
6
5
4+++
3
2
1

Simplified diagram; the actual position was more complex

Key observations:

  • White has outside influence on the right
  • Black has territorial potential on the top
  • The battle in the upper right corner has paused

It was Black's (AlphaGo's) turn to play.


Traditional Move Analysis

Professional Players' Expectations

Before Move 37, professional players in the commentary room were enthusiastically discussing. They generally expected Black to choose one of the following moves:

Option A: Approach in the lower right corner

This was the most "normal" choice. Black could:

  • Claim the last big point (lower right corner)
  • Maintain balance in the game
  • Follow the traditional value of "corner-side-center"

Option B: Enclose the top

Black could also extend two or three spaces on the top to solidify his sphere of influence. This would:

  • Convert the potential on top into territory
  • Limit White's development space

Option C: Center invasion

Some players thought Black might play in the center to constrain White's right-side influence. While not the most common choice, it was strategically justifiable.


The Unexpected Choice

However, AlphaGo chose a position almost no one anticipated:

E5 (Fifth-line shoulder hit)

This move was placed on the right half of the board, near the center, a "shoulder hit" against White's right-side influence.


Move 37: The Fifth-Line Shoulder Hit

Where Is This Move?

ABCDEFGHJKLMNOPQRST
19
18
17
16+++
1537Move 37
14
13
12

Move 37 was played at K15 (or J5, depending on the coordinate system).

What Is a "Shoulder Hit"?

A "shoulder hit" is a technique in Go that refers to playing diagonally close to an opponent's stone. Its characteristics are:

  • No direct contact: Maintains one space distance from the opponent's stone
  • Disrupts structure: Throws off the opponent's expected development
  • Difficult to respond: Any response from the opponent comes with some cost

Traditionally, shoulder hits are played on the third or fourth line. A fifth-line shoulder hit is extremely rare because:

  1. Position too high: The fifth line is close to the center, traditionally considered inefficient
  2. Easy to attack: Isolated stones can become targets
  3. Unclear value: Unlike corners and sides, it lacks clear territorial value

Expert Reactions in Real-Time

Shock in the Commentary Room

The moment Move 37 was played, the commentary room fell into brief silence.

Korean commentary (Kim Seong-ryong 9p):

"This... what is this? A move on the fifth line? I don't understand. This must be a mistake, right?"

Chinese commentary (Gu Li 9p):

"I can't understand this move. If one of my students played this, I would scold them severely."

American commentary (Michael Redmond 9p):

"Very unusual move. I don't think any human would play this."

Real-Time Comments from Professional Players

On various live streaming platforms, professional players were commenting:

Ke Jie (World No. 1 at the time):

"I cannot understand the intention of this move. If AlphaGo wins, I will study it carefully."

Park Junghwan (Top Korean player):

"This move is too strange. Is there a bug in the program?"

Mi Yuting (Chinese World Champion):

"Fifth-line shoulder hit? I've never seen this kind of move."

"One in Ten Thousand Probability"

After the match, the DeepMind team revealed a stunning statistic:

"According to our analysis, if a professional player faced the same position, the probability of choosing Move 37's position would be about one in ten thousand."

In other words, in the human Go knowledge system, this move was virtually a "non-existent" option.


The AI Perspective

Policy Network Probability Distribution

Let's see how AlphaGo's Policy Network evaluated this position:

載入中...

The chart above shows AlphaGo's probability assessment for each position.

Key observations:

  • Move 37's position: About 8% probability, not the highest
  • Traditional choices (like lower right corner): About 12% probability
  • Other candidate positions: Scattered across different areas

Interestingly, Move 37 was not the highest probability choice in the Policy Network's evaluation. So why did AlphaGo choose it?

MCTS Deep Evaluation

The answer lies in Monte Carlo Tree Search (MCTS).

The Policy Network only provides "intuition"; the real decision comes from MCTS's deep simulation. AlphaGo simulates thousands of possible futures before making a decision.

For Move 37, the MCTS evaluation process was:

Position K15 (Move 37):
├── Simulation 1: Black wins (+0.3)
├── Simulation 2: Black wins (+0.5)
├── Simulation 3: Black wins (+0.2)
├── ...
└── Average win rate: 58%

Position R3 (Lower right approach):
├── Simulation 1: Black wins (+0.1)
├── Simulation 2: White wins (-0.2)
├── Simulation 3: Black wins (+0.2)
├── ...
└── Average win rate: 52%

Although the lower right corner had higher "intuitive probability," after deep simulation, Move 37's expected win rate was higher.

Value Network Global Assessment

The Value Network assessed Move 37's value from a global perspective:

Win rate before Move 37: About 52% (Black slightly ahead)

Win rate after Move 37: About 58% (Black with clear advantage)

This means Move 37 increased AlphaGo's expected win rate by 6 percentage points.

This improvement is quite significant in Go. Usually, a good move brings only 2-3% win rate improvement.


Go Theory Analysis: Why the Fifth-Line Shoulder Hit?

From a Local Perspective

On the surface, Move 37 seems inefficient:

  • Position too high: Fifth line is closer to the center than fourth or third
  • No territory: Unlike corners and sides, it doesn't directly claim territory
  • Vulnerable to attack: Isolated stones could be targeted by White

But if we analyze carefully, this move has several subtle benefits:

  1. Disrupts White's influence: White had planned to develop on the right; Move 37 disrupted this plan
  2. Establishes presence: Though not enclosing territory, it establishes presence in the center
  3. Increases complexity: Creates a complex position, favoring the side with stronger calculation

From a Global Perspective

The true value of this move needs to be understood globally:

The Thickness vs. Territory Trade-off

Traditional Go theory holds that "corners are gold, sides are silver, center is grass" — corners are most valuable, center least valuable. But Move 37 challenged this notion.

AlphaGo's evaluation showed: in this specific position, central influence was more valuable than corner territory.

This is because:

  • Black already had sufficient territorial foundation
  • White's right-side influence would be powerful if allowed to develop
  • Constraining White was more important than expanding oneself

The Value of "Sente"

Move 37 had another underestimated benefit: it maintained "sente" (initiative).

In Go, "sente" means controlling the initiative. After Move 37, White had to respond, allowing Black to continue directing the game's flow.

If Black had chosen a "normal" approach in the lower right corner, both sides might have engaged in joseki, and the position would have balanced. But Move 37 broke this balance, filling the game with uncertainty — exactly what AlphaGo excelled at.

Lee Sedol's Dilemma

After Move 37, Lee Sedol thought for a long time. His dilemma was:

If he responds directly (like jumping or flying):

  • It acknowledges Move 37's value
  • Black achieves the goal of disrupting White's influence

If he ignores it:

  • Black might further develop the center
  • White's right-side influence would struggle to become territory

In the end, Lee Sedol chose to respond. But regardless of his choice, Move 37 had already achieved its purpose.


Subsequent Development: From Move 37 to Victory

Middle Game Evolution

After Move 37, the game entered a complex middle game battle.

Key developments:

  • Moves 40-50: Both sides engaged in fierce contact fighting on the right
  • Moves 50-70: AlphaGo leveraged the influence established by Move 37 to gain advantage in the center
  • Moves 70-100: Black gradually converted the advantage into territory

By around Move 100, AlphaGo's lead was quite clear. Although Lee Sedol tried to fight back, he couldn't turn the situation around.

Final Result

AlphaGo wins by resignation

This game's victory was largely due to Move 37. Post-game analysis showed that without Move 37, the position would have been much closer, and White might even have had the advantage.


Impact on Go Theory

Birth of New Joseki

Move 37 triggered a reconsideration of the "shoulder hit" technique in the Go world.

Traditional view:

  • Shoulder hits should be on the third or fourth line
  • Fifth-line shoulder hits are too inefficient
  • Isolated stones are vulnerable to attack

After AlphaGo:

  • Fifth-line shoulder hits are the best choice in certain positions
  • Position "height" matters less than "effect"
  • Each move's value needs to be evaluated from a global perspective

Human Players Learning

After Move 37, many professional players began trying similar moves:

Ke Jie used fifth-line shoulder hits successfully in several games in 2017:

"AlphaGo taught me that many moves we thought were 'bad' are simply moves we didn't understand."

Park Junghwan also incorporated this way of thinking into his games:

"The important thing isn't remembering the specific position of Move 37, but learning to see the board with new eyes."

Implications for Go AI Training

Move 37 also had far-reaching implications for Go AI research:

Reflection on Policy Network:

Why did the Policy Network give Move 37 a lower probability? Because it learned from human game records, and humans rarely play such moves.

This shows: supervised learning alone (learning from humans) is not enough. AI needs to explore on its own to discover good moves unknown to humans.

This was one reason why AlphaGo Zero later adopted pure self-play training.

Affirmation of MCTS:

Move 37 proved the value of deep MCTS search. Even when intuition (Policy Network) doesn't favor a move, deep analysis can discover its potential value.

This insight was later applied to many other fields.


Technical Details: Recreating Move 37's Decision Process

Policy Network Input Features

After Move 36, the Policy Network's input included:

Feature PlaneDescription
1-8Black stone positions (past 8 moves)
9-16White stone positions (past 8 moves)
17Whose turn it is
18-48Other features (liberties, atari, etc.)

Total of 48 feature planes of 19x19, forming the input tensor.

Policy Network Output

The Policy Network outputs a 19x19 = 361 dimensional probability distribution.

For Move 37's position:

# Top 5 candidate positions (simplified)
{
"R3": 0.12, # Lower right approach
"Q17": 0.10, # Upper right corner
"C10": 0.09, # Left side big point
"K15": 0.08, # Move 37's position
"D16": 0.07, # Upper left corner
# ... 356 other positions
}

MCTS Exploration Process

AlphaGo uses the PUCT formula to balance exploration and exploitation:

U(s,a) = Q(s,a) + c_puct × P(s,a) × sqrt(sum_b N(s,b)) / (1 + N(s,a))

Where:

  • Q(s,a): Average value of position a
  • P(s,a): Probability given by Policy Network
  • N(s,a): Number of times this position was explored
  • c_puct: Exploration constant

For Move 37, although the initial probability P was low, after multiple simulations, the Q value kept increasing, eventually surpassing other candidates.

Impact of Simulation Count

The DeepMind team later analyzed that "discovering" Move 37 required sufficient simulations:

Simulation CountBest Choice
100R3 (lower right)
1,000Q17 (upper right)
10,000K15 (Move 37)
100,000K15 (more certain)

This shows: deep search can discover good moves that shallow search cannot find.


Philosophical Reflections: Cognitive Differences Between Humans and AI

Why Couldn't Humans Think of Move 37?

This is a profound question. Possible reasons include:

1. Limitations of Experience

Human players' knowledge comes from studying predecessors' game records. If predecessors never played a certain move, we won't consider it.

2. Bias of Intuition

Human intuition is useful but limited. Our intuition makes us "blind" to certain options.

3. Difference in Computational Ability

Move 37's value required deep calculation to discover. Human computational ability is limited; we can't simulate thousands of possibilities like AI.

What Is Machine "Intuition"?

Does AlphaGo have "intuition"?

In a sense, the Policy Network is AlphaGo's "intuition" — it can evaluate each position's potential in milliseconds.

But this "intuition" differs from human intuition:

  • Human intuition: Comes from experience and pattern recognition
  • AI intuition: Comes from statistical learning on massive data

Interestingly, Move 37 proved that: AI's "intuition" can be corrected by MCTS. This means AI can "reflect" on its own intuition and find better choices.

What Can Humans Learn from AI?

The biggest insight from Move 37 for human players may be:

Don't let experience become shackles

Many "bad" moves may simply be moves we don't understand. Opening our minds and being willing to try unconventional moves may reveal new possibilities.

This insight applies not just to Go, but to many areas of life.


Animation Reference

Core concepts in this article and their animation numbers:

NumberConceptPhysics/Math Correspondence
C3Traditional Go value judgmentHeuristic function
C5Geometric properties of shoulder hitSpatial relations
C7Gap between expert intuition and AI evaluationPrediction error
C9Policy Network output distributionSoftmax probability
C11How MCTS corrects Policy NetworkBayesian update
C13Value Network incremental evaluationValue function
C15Global value function calculationIntegral approximation
C17Forced choice in game theoryDominant strategy
C19How one move changes the entire gameBifurcation point
C21How AI expands human cognitive boundariesSearch space expansion
C23Importance of feature engineering in Go AIRepresentation learning
C25How PUCT formula discovers non-intuitive good movesExploration-exploitation tradeoff
C27Cognitive bias and AI transcendenceUnbiased estimation

Further Reading


Interactive Exploration

Policy Network Probability Distribution

Use the interactive visualization below to explore the Policy Network's output in different positions:

載入中...

Try switching between different preset positions to observe how AI evaluates each position's probability of being a good move.


References

  1. Silver, D., et al. (2016). "Mastering the game of Go with deep neural networks and tree search." Nature, 529, 484-489.
  2. DeepMind Blog: "AlphaGo: The story so far"
  3. AlphaGo Documentary (2017), Director Greg Kohs.
  4. Lee Sedol vs AlphaGo Game 2 Official Game Record
  5. Go4Go.net Professional Game Analysis
  6. Korea Baduk Association Post-Match Technical Report