Skip to main content

Input Feature Design

Neural networks can only process numbers. To make them understand Go, we need a way to "translate" the board into numbers.

This translation process is input feature design.

AlphaGo used 48 feature planes, AlphaGo Zero simplified to 17, and KataGo optimized to 22. This article will explain the considerations behind these design choices in detail.


What are Feature Planes?

Basic Concept

A feature plane is a 19×19 matrix where each element represents a certain property of the corresponding board position.

For example, the "Black stone positions" feature plane:

Board state:                 Feature plane (Black stones):
A B C D E A B C D E
1 . . . . . 1 0 0 0 0 0
2 . ● . . . 2 0 1 0 0 0
3 . . ○ . . → 3 0 0 0 0 0
4 . . . ● . 4 0 0 0 1 0
5 . . . . . 5 0 0 0 0 0
  • Position with Black stone = 1
  • Position without Black stone = 0

Multiple Feature Planes

Neural networks need various information, so we stack multiple feature planes:

This is similar to how color images have R, G, B channels. Go "images" have N channels.


AlphaGo's 48 Feature Planes

Complete List

AlphaGo uses 48 feature planes, divided into several categories:

1. Stone Positions (3 planes)

PlaneNameDescription
1BlackBlack stone = 1, otherwise = 0
2WhiteWhite stone = 1, otherwise = 0
3EmptyEmpty point = 1, otherwise = 0

2. History (16 planes)

PlaneNameDescription
4-11Black historyBlack positions 1-8 moves ago
12-19White historyWhite positions 1-8 moves ago

Why is history needed?

  • Ko detection: Need to know if immediate recapture is allowed
  • Playing intention: Recent moves reveal both sides' plans
  • Temporal information: CNN itself doesn't handle time, history planes fill this gap

3. Liberty Features (8 planes)

PlaneNameDescription
20-231-4 liberties (own)Own string has 1/2/3/4 liberties = 1
24-271-4 liberties (opponent)Opponent string has 1/2/3/4 liberties = 1

Liberty count is the most important tactical concept in Go:

  • 1 liberty: In atari, about to be captured
  • 2 liberties: Dangerous state
  • 3 liberties: Needs attention
  • 4+ liberties: Temporarily safe

4. Capture Features (8 planes)

PlaneNameDescription
28-31Capture positions (own)Playing here can capture opponent's 1/2/3/4 stones
32-35Capture positions (opponent)Playing here can capture own 1/2/3/4 stones

Atari is the most common tactic in Go:

  • Capturing more stones = bigger threat
  • Different capture sizes require different responses

5. Ladder Features (8 planes)

PlaneNameDescription
36-39Ladder-related (own)Positions related to own ladders
40-43Ladder-related (opponent)Positions related to opponent ladders

The ladder is a famous Go tactic:

  • Chasing opponent's stones along diagonal
  • Need to determine "ladder works" or "ladder fails"
  • Requires global vision, a challenge for traditional Go programs

6. Legality Feature (1 plane)

PlaneNameDescription
44Legal positionsCan legally play here = 1

This prevents the network from outputting illegal moves:

  • Cannot play where a stone already exists
  • Cannot play at suicide points (self-capture without capturing)
  • Cannot immediately recapture in ko

7. Border Features (4 planes)

PlaneNameDescription
45Distance to edge 1On 1st line = 1
46Distance to edge 2On 2nd line = 1
47Distance to edge 3On 3rd line = 1
48Distance to edge 4+On 4th line or inner = 1

Edges and corners have special meaning in Go:

  • 1st line: Death line, stones easily captured
  • 2nd line: Survival line, but inefficient
  • 3rd line: Territory line, solid
  • 4th line: Influence line, pursuing influence

Why So Many Features?

DeepMind's design philosophy was to provide as much information as possible, letting the network decide what's useful:

Raw board → 48 feature planes → Neural network → Decision

Feature engineer's job: Encode Go knowledge as features
Neural network's job: Learn to combine these features

This is a strategy of "passing the ball to the neural network" - humans handle feature design, the network handles learning combinations.


AlphaGo Zero's Simplification: 17 Feature Planes

Revolutionary Change

AlphaGo Zero dramatically simplified input features:

VersionFeature PlanesHuman Knowledge Used
AlphaGo48Extensive (liberties, ladders, etc.)
AlphaGo Zero17Almost none

17 Planes Composition

1. Stone Position History (16 planes)

PlaneNameDescription
1-8Black T-0 to T-7Black positions at current and past 7 moves
9-16White T-0 to T-7White positions at current and past 7 moves

2. Color (1 plane)

PlaneNameDescription
17Whose turnBlack's turn = all 1s, White's turn = all 0s

Why Can It Be Simplified?

AlphaGo Zero's core insight:

Given enough computation and training time, neural networks can learn these features themselves

Concepts like "liberties," "atari," "ladders" - humans took thousands of years to develop. But AlphaGo Zero proved that neural networks can learn them in days - and perhaps learn even better representations than humans.

Performance Comparison

Surprisingly, AlphaGo Zero with fewer features is actually stronger:

VersionFeaturesTraining TimeFinal Strength
AlphaGo Master48Months~5185 Elo
AlphaGo Zero1740 days~5185 Elo
AlphaGo Zero (3 days)173 daysSurpasses humans

Less human knowledge actually leads to stronger performance.

Why Is Human Knowledge a Burden?

1. Human Knowledge May Be Wrong

Human-summarized Go principles are empirical and may not be optimal. For example:

  • "Golden corner, silver edge, grass belly" - but in some positions the center is more important
  • "Don't play ladder if it fails" - but sometimes you can deliberately sacrifice

2. Feature Encoding Limits Representation

When we encode "liberty count" as four planes for 1-4 liberties, we implicitly assume "liberty count" is an important classification method. But perhaps there are better classifications, and this encoding prevents the network from discovering them.

3. Representation Bottleneck

48 planes consume more computational resources. If some features are redundant, these resources are wasted.


KataGo's Optimization: 22 Feature Planes

Pragmatic Balance

KataGo built on AlphaGo Zero's foundation, adding a small amount of carefully selected human knowledge:

ItemAlphaGo ZeroKataGo
History planes165
Stone positionsYesYes
Whose turnYesYes
Ko stateNoYes
Rule variantsNoYes (komi, suicide rules, etc.)
Total1722

KataGo's Feature List

Basic Features (5)

PlaneNameDescription
1BlackCurrent black stone positions
2WhiteCurrent white stone positions
3EmptyCurrent empty positions
4Whose turn (1)Constant plane always 1
5Whose turn (2)Black's turn = 1, White's turn = 0

History Features (5)

PlaneNameDescription
6Last moveOpponent's last move position
72nd last moveOwn last move position
83rd last moveOpponent's 2nd last move
94th last moveOwn 2nd last move
105th last moveOpponent's 3rd last move

Ko Features (3)

PlaneNameDescription
11Ko forbiddenCurrent ko forbidden point
12Potential ko (own)Playing here creates ko
13Potential ko (opponent)Opponent playing here creates ko

Rule Features (9)

PlaneNameDescription
14-22Rule encodingKomi, suicide rules, superko, etc.

Why Add These Features?

KataGo's author lightvector explains:

1. Ko Is Too Important

Ko is one of the most complex concepts in Go. Learning ko rules purely from raw board states requires massive samples. Explicitly marking ko forbidden points accelerates learning.

2. Rule Diversity

Go has multiple rule sets:

  • Komi: Chinese rules 7.5 points, Japanese rules 6.5 points
  • Suicide rules: Some rules allow suicide
  • Superko: Different ways to handle long cycles

Explicitly encoding rules in input allows one network to handle all variants.

3. Training Efficiency

Adding a small amount of human knowledge can dramatically accelerate training. KataGo achieved with 50 GPU-days what AlphaGo Zero took 5000+ TPU-days to achieve.


Philosophy of Feature Design

Three Approaches

ApproachRepresentativeFeature CountHuman KnowledgeCompute Required
Heavy human knowledgeAlphaGo48ExtensiveMedium
Minimal human knowledgeAlphaGo Zero17Almost noneVery high
Moderate human knowledgeKataGo22Small selectionLower

Trade-off Considerations

Limited Resources

If computational resources are limited (most researchers' situation), adding some human knowledge is wise:

  • Accelerates training convergence
  • Reduces required training data
  • Avoids reinventing the wheel

Pursuing the Limit

If computational resources are abundant, reducing human knowledge may achieve higher strength:

  • Avoids human biases
  • Discovers strategies unknown to humans
  • True "starting from scratch"

Insights

The AlphaGo series evolution tells us:

  1. Feature engineering still matters - but the form has changed
  2. End-to-end learning is the trend - let networks learn features themselves
  3. No single correct answer - depends on resources and goals

Implementation Examples

Feature Extraction (AlphaGo Style)

import numpy as np

def extract_features_alphago(board, history, current_player):
"""
Extract AlphaGo-style 48 feature planes

board: 19×19 board, 0=empty, 1=black, 2=white
history: Last 8 moves' history
current_player: 1=black, 2=white
"""
features = np.zeros((48, 19, 19))

# 1-3: Stone positions
features[0] = (board == 1) # Black
features[1] = (board == 2) # White
features[2] = (board == 0) # Empty

# 4-19: History positions
for i, hist_board in enumerate(history[:8]):
features[3 + i] = (hist_board == 1) # Black history
features[11 + i] = (hist_board == 2) # White history

# 20-27: Liberty features
liberties = compute_liberties(board)
for i, lib_count in enumerate([1, 2, 3, 4]):
my_color = current_player
opp_color = 3 - current_player
features[19 + i] = (liberties == lib_count) & (board == my_color)
features[23 + i] = (liberties == lib_count) & (board == opp_color)

# 28-35: Capture features
capture_counts = compute_captures(board)
for i, cap_count in enumerate([1, 2, 3, 4]):
features[27 + i] = (capture_counts[current_player] == cap_count)
features[31 + i] = (capture_counts[3-current_player] == cap_count)

# 36-43: Ladder features (simplified)
ladder_status = compute_ladder(board)
# ... implementation omitted ...

# 44: Legal positions
features[43] = compute_legal_moves(board, current_player)

# 45-48: Border distance
for i in range(19):
for j in range(19):
dist = min(i, j, 18-i, 18-j)
if dist == 0:
features[44, i, j] = 1
elif dist == 1:
features[45, i, j] = 1
elif dist == 2:
features[46, i, j] = 1
else:
features[47, i, j] = 1

return features

Feature Extraction (AlphaGo Zero Style)

def extract_features_zero(board_history, current_player):
"""
Extract AlphaGo Zero-style 17 feature planes

board_history: List of last 8 board states
current_player: 1=black, 2=white
"""
features = np.zeros((17, 19, 19))

# 1-8: Black positions at T-0 to T-7
for i, board in enumerate(board_history[:8]):
features[i] = (board == 1)

# 9-16: White positions at T-0 to T-7
for i, board in enumerate(board_history[:8]):
features[8 + i] = (board == 2)

# 17: Whose turn
if current_player == 1: # Black
features[16] = np.ones((19, 19))
else:
features[16] = np.zeros((19, 19))

return features

Performance Comparison

import time

# Simulate 1000 feature extractions
board = np.random.randint(0, 3, (19, 19))
history = [np.random.randint(0, 3, (19, 19)) for _ in range(8)]

# AlphaGo style (with complex computations)
start = time.time()
for _ in range(1000):
features = extract_features_alphago(board, history, 1)
alphago_time = time.time() - start

# AlphaGo Zero style (simple)
start = time.time()
for _ in range(1000):
features = extract_features_zero(history, 1)
zero_time = time.time() - start

print(f"AlphaGo style: {alphago_time:.2f}s")
print(f"AlphaGo Zero style: {zero_time:.2f}s")
# Typical result: AlphaGo style 5-10x slower

Visualizing Feature Planes

Real Position Example

Actual board:
A B C D E F G H J K L M N O P Q R S T
19 . . . . . . . . . . . . . . . . . . .
18 . . . . . . . . . . . . . . . . . . .
17 . . . ● . . . . . . . . . . . ○ . . .
16 . . . . . . . . . . . . . . . . . . .
15 . . . . . . . . . . . . . . . . . . .
...

Feature plane 1 (Black):
A B C D E F G H J K L M N O P Q R S T
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...

Feature plane 2 (White):
A B C D E F G H J K L M N O P Q R S T
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...

Insights from Feature Planes

Observing different feature planes helps understand what the model "sees":

FeatureIntuitive MeaningWhat Model Might Learn
Black/White positionsWho is whereShapes, connectivity
HistoryWhat happened recentlyPlaying intentions, fighting directions
LibertiesWho is in dangerAttack/defense targets
CapturesTactical opportunitiesLocal tactics
Border distancePosition importanceOpening moves, corner joseki

Animation Reference

Core concepts covered in this article with animation numbers:

NumberConceptPhysics/Math Correspondence
Animation A8Feature encodingTensor representation
Animation A10Input normalizationFeature engineering
Animation D1Convolution inputMulti-channel images
Animation E3Zero's simplificationMinimal representation

Further Reading


Key Takeaways

  1. Feature planes are digital board representations: Each plane is a 19×19 matrix
  2. AlphaGo uses 48 planes: Contains extensive human Go knowledge
  3. AlphaGo Zero simplifies to 17: Proves networks can learn features themselves
  4. KataGo optimizes to 22: Balances efficiency and performance
  5. Feature design is a trade-off: Human knowledge vs. computational resources

Input feature design is the bridge connecting "human-understood Go" with "machine-processable numbers."


References

  1. Silver, D., et al. (2016). "Mastering the game of Go with deep neural networks and tree search." Nature, 529, 484-489.
  2. Silver, D., et al. (2017). "Mastering the game of Go without human knowledge." Nature, 551, 354-359.
  3. Wu, D. (2019). "Accelerating Self-Play Learning in Go." arXiv:1902.10565.
  4. KataGo Documentation: https://github.com/lightvector/KataGo