playing zero-sum games on the gpu

32
Avi Bleiweiss NVIDIA Corporation San Jose | 2010 Playing Zero-Sum Games on the GPU

Upload: vonguyet

Post on 02-Jan-2017

229 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Playing Zero-Sum Games on the GPU

Avi Bleiweiss NVIDIA Corporation San Jose | 2010

Playing Zero-Sum Games on the GPU

Page 2: Playing Zero-Sum Games on the GPU

Zero-Sum

One’s gain, other’s loss

Perfect info

Multi player game

— Simple and involved

Matching

Pennies Head Tail

Head 1,-1 -1,1

Tail -1,1 1,-1

Page 3: Playing Zero-Sum Games on the GPU

Deep Blue

32 CPUs

512 Chess ASICs

1.15B transistors

0

50

100

150

200

1985 1988 1991 1994 1997

Million Moves/Seconds

6’5” tower pair

1.4 tons

100M$

Page 4: Playing Zero-Sum Games on the GPU

Motivation

Play as you go

— Thousands players

Game cloud

— GPU computing

Mobile computer

— 3D graphics

NVIDIA Tesla

NVIDIA Tegra

Page 5: Playing Zero-Sum Games on the GPU

Problem

Game

• Two player

• Maximize look ahead

• Complexity • 1025 for 4x4x4 Tic-Tac-Toe

• 10120 for Chess

Search

• Efficient parallel • Single game

• Simultaneous matches

• Graceful stack

• Linear scale

Page 6: Playing Zero-Sum Games on the GPU

Game Tree

Root

A

C D E

B

F G H

d

g

9 2 7 3 2 4 2 1 5 8 7 7 2 9 1 1 6 2

Page 7: Playing Zero-Sum Games on the GPU

Mini-Max

Build and search

Unbound DFS

— All nodes visited

Non tail recursive

Depth limited

Page 8: Playing Zero-Sum Games on the GPU

Principal Variation

Root

A

C D E

B

F G H

d

g

9 2 7 3 2 4 2 1 5 8 7 7 2 9 1 1 6 2

7 4 5 8 1 6

1

4

4

Page 9: Playing Zero-Sum Games on the GPU

Alpha-Beta

Enhances Mini-Max

Elegant, efficient

— Prunes nodes

Perfect game

— Possible in cases

Page 10: Playing Zero-Sum Games on the GPU

Pruning

Root

A

C D E

B

F G H

d

g

9 2 7 3 2 4 2 1 5 8 7 7 2 9 1 1 6 2

∞ ∞

∞ ∞

∞ 7

∞ ∞

7 ∞ 4 ∞

7 ∞ 7 4

4 ∞ 4 5

1 4

∞ 4

Page 11: Playing Zero-Sum Games on the GPU

Optimization

Aspiration Search

• Tree value known

• Narrow window

• Re-search if fails

Principal Variation Search

• Non PV nodes

• beta = alpha + 1

• Full window PV nodes

• Perfect ordered tree

Iterative Deepening

• Fixed depth searches

• Transposition table

• Hash branch positions [Marsland, T. A. 1986]

Page 12: Playing Zero-Sum Games on the GPU

Parallelism

Principal Variation Split

• Strongly ordered tree

• Synchronization bound

• Load imbalance

Young Brothers Wait Concept

• Parallel at any node

• Processor owns node

• Scales to # processors

Dynamic Tree Splitting

• Processors share node

• Global job list

• Reasonable speedup

[Feldmann, R. 1993]

Page 13: Playing Zero-Sum Games on the GPU

Challenges

Deep recursion, limited stack

Divergent, irregular threads

Dynamic parallelism

Low arithmetic intensity

Page 14: Playing Zero-Sum Games on the GPU

Implementation

Kernel for each

— Mini-Max, Alpha-Beta

Board C++ class

— Rules specific

Games

3D Tic-Tac-Toe Connect-4 Reversi

Page 15: Playing Zero-Sum Games on the GPU

Board

Cells

Player

Successors

Move

Query

Winner

Full

Manipulate

Update

Undo

Page 16: Playing Zero-Sum Games on the GPU

Stack

Recursion depth >1000

Greedy allocation

Hybrid design

Local

Memory Global

Memory

Runtime/Compiler

• Local variables

• Function parameters

User

• Successors

Page 17: Playing Zero-Sum Games on the GPU

Split

Thousands

of working

threads

Find highest cut nodes

Parallel node search

Resolve up to root

CPU

GPU

producer consumer

Game Tree

game init

game over

foreach

move

Page 18: Playing Zero-Sum Games on the GPU

Shared αβ

Kernel global

scope

Check

private αβ Global atomic

update

Page 19: Playing Zero-Sum Games on the GPU

Limitations

Stack allocation

— Bounds parallelism

Split constraints

Depth Tic-Tac-Toe Threads

3x3x3 4x4x4 5x5x5

1 650 3906 15252

2 15600 238266 1860744

Page 20: Playing Zero-Sum Games on the GPU

Methodology

CUDA Toolkit 3.1, Windows

Processors

GPU SMs Warps/SM Clocks(MHz) L1/Shared (KB) L2(KB)

GTX480 15 2 723/1446/1796 48/16 640

CPU Cores Clocks(MHz) L1/L2 (KB)

I7-940 8 2942/(3*1066) 32/8192

Page 21: Playing Zero-Sum Games on the GPU

Single Game

0

0.5

1

1.5

2

2.5

3906 3660 3422 3192 2970 2756 2550

Sec

on

ds/

Mo

ve

Threads/Move

4x4x4 Tic-Tac-Toe

Naïve Split Shared αβ

lower is

good

Page 22: Playing Zero-Sum Games on the GPU

Simultaneous Matches

0

1

2

3

4

5

6

0

1

2

3

4

5

6

7

8

1 16 128 1024 4096 16384

Sp

eed

up

Av

era

ge

Sec

on

ds/

Move

Matches

4x4x4 Tic-Tac-Toe

GTX480 i7 8 Threads

lower is

good

higher is

good

Page 23: Playing Zero-Sum Games on the GPU

Game Analysis

0

1

2

3

4

5

6

7

8

1 3 5 7 9 11 13 15 17 19 21 23

Sec

on

ds

per

Mo

ve

Move #

4x4x4 Tic-Tac-Toe

GTX480 i7 8 Threads

lower is

good

4K Matches

Page 24: Playing Zero-Sum Games on the GPU

Future Work

Data packing

Backtracking search

Sudoku, toy game

— Generator, solver

Multi recursion

Page 25: Playing Zero-Sum Games on the GPU

GPU Performance

Metric Game Dimension Speedup

Shared αβ vs. Naïve Split Tic-Tac-Toe 4x4x4 13.37X mean

Simultaneous Matches

vs. CPU

Tic-Tac-Toe 4x4x4 5.22X @ 16K

Connect-4 7x6 6.26X @ 32K

Reversi 8x8 5.96X @ 16K

Page 26: Playing Zero-Sum Games on the GPU

Summary

Efficient hybrid stack

Dynamic parallelism

Tegra, Tesla solution

3D Chess on GPU!

Courtesy freegames4all

Deep Blue Tesla

$/(Moves/Sec) 0.5 0.02

Page 27: Playing Zero-Sum Games on the GPU

Thank You!

Questions?

Page 29: Playing Zero-Sum Games on the GPU

Backup

Page 30: Playing Zero-Sum Games on the GPU

Simultaneous Matches (1)

0

1

2

3

4

5

6

7

0

0.5

1

1.5

2

2.5

1 16 128 1024 4096 16384 32768

Sp

eed

up

Av

era

ge

Sec

on

ds

/ M

ove

Matches

7x6 Connect-4

GTX480 i7 8 Threads

lower is

good

higher is

good

Page 31: Playing Zero-Sum Games on the GPU

Simultaneous Matches (2)

0

1

2

3

4

5

6

7

0

0.2

0.4

0.6

0.8

1

1 16 128 1024 4096 16384

Sp

eed

up

Aver

ag

e S

econ

ds

/ M

ov

e

Matches

8x8 Reversi

GTX480 i7 8 Threads

lower is

better

higher is

better

Page 32: Playing Zero-Sum Games on the GPU

Simultaneous Puzzles

0

0.5

1

1.5

2

2.5

3

3.5

4

0

1

2

3

4

5

6

7

1 16 128 1024 4096 16384

Sp

eed

up

Sec

on

ds

Puzzles

9x9 Sudoku

GTX480 i7 8 Threads

lower is

better

higher is

better