superjong - botzone€¦ · t1 t2 t3 t4 t5 t6 t7 t8 t9 w1 w2 w3 w4 w5 w6 w7 w8 w9 b1 b2 b3 b4 b5 b6...

10
SuperJong Yata 2021.01

Upload: others

Post on 05-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SuperJong - Botzone€¦ · T1 T2 T3 T4 T5 T6 T7 T8 T9 W1 W2 W3 W4 W5 W6 W7 W8 W9 B1 B2 B3 B4 B5 B6 B7 B8 B9 F1 F2 F3 F4 J1 J2 J3 — — Reflecttable Hand tiles Image-like feature

SuperJong

Yata2021.01

Page 2: SuperJong - Botzone€¦ · T1 T2 T3 T4 T5 T6 T7 T8 T9 W1 W2 W3 W4 W5 W6 W7 W8 W9 B1 B2 B3 B4 B5 B6 B7 B8 B9 F1 F2 F3 F4 J1 J2 J3 — — Reflecttable Hand tiles Image-like feature

Outline

Overview

Feature & Model Design

Training Methods

Result

1

2

3

4

Page 3: SuperJong - Botzone€¦ · T1 T2 T3 T4 T5 T6 T7 T8 T9 W1 W2 W3 W4 W5 W6 W7 W8 W9 B1 B2 B3 B4 B5 B6 B7 B8 B9 F1 F2 F3 F4 J1 J2 J3 — — Reflecttable Hand tiles Image-like feature

Overview

ReinforcementLearning

Deep CNNGlobal value

Self-playZero start

Asynchronous training

Page 4: SuperJong - Botzone€¦ · T1 T2 T3 T4 T5 T6 T7 T8 T9 W1 W2 W3 W4 W5 W6 W7 W8 W9 B1 B2 B3 B4 B5 B6 B7 B8 B9 F1 F2 F3 F4 J1 J2 J3 — — Reflecttable Hand tiles Image-like feature

Feature & Model Design

Visible information

Self hands

Self chow

Self pong, kong

Self Drops

Others chow

Others pong, kong

Drops

T1 T2 T3 T4 T5 T6 T7 T8 T9

W1 W2 W3 W4 W5 W6 W7 W8 W9

B1 B2 B3 B4 B5 B6 B7 B8 B9

F1 F2 F3 F4 J1 J2 J3 — —

T1 T2 T3 T4 T5 T6 T7 T8 T9

W1 W2 W3 W4 W5 W6 W7 W8 W9

B1 B2 B3 B4 B5 B6 B7 B8 B9

F1 F2 F3 F4 J1 J2 J3 — —

T1 T2 T3 T4 T5 T6 T7 T8 T9

W1 W2 W3 W4 W5 W6 W7 W8 W9

B1 B2 B3 B4 B5 B6 B7 B8 B9

F1 F2 F3 F4 J1 J2 J3 — —

T1 T2 T3 T4 T5 T6 T7 T8 T9

W1 W2 W3 W4 W5 W6 W7 W8 W9

B1 B2 B3 B4 B5 B6 B7 B8 B9

F1 F2 F3 F4 J1 J2 J3 — —

Reflect table

Hand tiles

Image-like feature

0 0 0 0 0 0 0 0 0

W1 W2 W3 W4 W5 W6 W7 0 0

B1 B2 B3 B4 B5 B6 B7 B8 B9

F1 F2 F3 F4 J1 J2 J3 — —

0 0 1 0 0 0 0 0 0

W1 W2 W3 W4 W5 W6 W7 W8 0

B1 B2 B3 B4 B5 B6 B7 B8 0

F1 F2 F3 F4 J1 J2 J3 — 0

0 1 1 0 0 0 0 0 0

W1 W2 W3 W4 W5 W6 W7 W8 0

B1 B2 B3 B4 B5 B6 B7 B8 0

F1 F2 F3 F4 J1 J2 J3 — 0

1 1 1 0 0 0 0 0 0

1 0 1 0 1 1 0 0 0

0 0 0 0 0 1 1 0 0

0 0 0 1 0 0 0 0 0

Invisible information

other hands

Remain deck

Image like features

Page 5: SuperJong - Botzone€¦ · T1 T2 T3 T4 T5 T6 T7 T8 T9 W1 W2 W3 W4 W5 W6 W7 W8 W9 B1 B2 B3 B4 B5 B6 B7 B8 B9 F1 F2 F3 F4 J1 J2 J3 — — Reflecttable Hand tiles Image-like feature

Feature & Model Design

Current hand

Winning hand

Need tiles No-need tiles

Shanhu features• Shanhu : the shortest distance from current hand to winning hand• Need tiles: the needed tiles to change current hand to the winning hand• No-need tiles: the tiles we don’t need in the winning hand

Page 6: SuperJong - Botzone€¦ · T1 T2 T3 T4 T5 T6 T7 T8 T9 W1 W2 W3 W4 W5 W6 W7 W8 W9 B1 B2 B3 B4 B5 B6 B7 B8 B9 F1 F2 F3 F4 J1 J2 J3 — — Reflecttable Hand tiles Image-like feature

Feature & Model Design

• architecture: a single end-to-end neural network• Input: all relevant information • Output: probabilities of all actions, the state value.

Model Design

Page 7: SuperJong - Botzone€¦ · T1 T2 T3 T4 T5 T6 T7 T8 T9 W1 W2 W3 W4 W5 W6 W7 W8 W9 B1 B2 B3 B4 B5 B6 B7 B8 B9 F1 F2 F3 F4 J1 J2 J3 — — Reflecttable Hand tiles Image-like feature

Training Methods

PPO

Self play from zero

𝐽!!"#$! 𝜃 ≈ ∑ %",'" 𝑚𝑖𝑛(# '"|%"(#! '"|%"

𝐴$! 𝑠*, 𝑎* , 𝑐𝑙𝑖𝑝(# '"|%"(#! '"|%"

, 1 − 𝜀, 1 + 𝜀 𝐴$! 𝑠*, 𝑎*

Page 8: SuperJong - Botzone€¦ · T1 T2 T3 T4 T5 T6 T7 T8 T9 W1 W2 W3 W4 W5 W6 W7 W8 W9 B1 B2 B3 B4 B5 B6 B7 B8 B9 F1 F2 F3 F4 J1 J2 J3 — — Reflecttable Hand tiles Image-like feature

Training Methods

Reward Norm• make training process stable• prevent AI too greedy

Input: ROutput: R_norm

function getNormReward

if dianpao:

R_norm = -2

elif lose:

R_norm = -1

else:

R_norm = $%

Page 9: SuperJong - Botzone€¦ · T1 T2 T3 T4 T5 T6 T7 T8 T9 W1 W2 W3 W4 W5 W6 W7 W8 W9 B1 B2 B3 B4 B5 B6 B7 B8 B9 F1 F2 F3 F4 J1 J2 J3 — — Reflecttable Hand tiles Image-like feature

Result

● rank1 in final round

Page 10: SuperJong - Botzone€¦ · T1 T2 T3 T4 T5 T6 T7 T8 T9 W1 W2 W3 W4 W5 W6 W7 W8 W9 B1 B2 B3 B4 B5 B6 B7 B8 B9 F1 F2 F3 F4 J1 J2 J3 — — Reflecttable Hand tiles Image-like feature

ThanksYata

2021.01