2004/11/13gpw20041 what shogi programs still cannot do - a new test set for shogi - reijer...
TRANSCRIPT
2004/11/13 GPW2004 1
What Shogi Programs Still Cannot Do- A New Test Set for Shogi -
Reijer Grimbergen and Taro Muraoka
Department of Informatics
Yamagata University
2004/11/13 GPW2004 2
Outline
The importance of testing
Test sets for chess
Test sets for shogi
A new test set for shogi
Problem area analysis
Some new results
Differences between humans and computers
Conclusions and future work
2004/11/13 GPW2004 3
The importance of testingGame programming
A program should play stronglyMore common is the reverse approach: minimize the number of bad moves
Testing can help determine problem areasIncremental testing
Save positions that the program did not handle wellDrawbacks
• Test set is program-specific• Positions selected subjectively
2004/11/13 GPW2004 4
The importance of testing
The requirements of a test setTesting a wide variety of potential problem areas
Not specific for one program
Test design in gamesMainly done for chess
Current test sets for shogi have shortcomings
Shogi research is at a point where focusing the effort could be a great help
Proposing a new test set for shogi
2004/11/13 GPW2004 5
Test sets for chessThe Bratko-Kopec test set
12 tactical positions and 12 strategic positionsDesigned to compare human and computer performance in chessThus far, no program can solve all positions
Reinfeld’s Win at chess300 tactical positionsUsed as a first test for new programs
LCT II35 positionsGood balance between strategic, tactical and endgame positionsAn ELO rating can be calculated from the solved positions
The Lindner test setA set of positions that are considered hard for computers to solve
2004/11/13 GPW2004 6
Test sets for shogiThe Matsubara-Iida test set
48 positions taken from professional gamesSelected by an expert playerAims at judging the strength of shogi programsFirst given to human players to establish a connection with playing strength
Problems with the Matsubara-Iida test setJudging programming strength can be established more accurately by playing on the internetNo ELO calculation like in LCT IISubjective selection leaves doubts about test balanceWhat is difficult for computers is not necessarily difficult for humans and vice versa, so connection with playing strength is unreliable
2004/11/13 GPW2004 7
Test sets for shogi
Other test sets for shogiYamashita’s test set (10 positions)
Tanase’s test set (19 positions)
Problems with these test setsToo small
Program specific
Unclear if there is only one solution
2004/11/13 GPW2004 8
A new test set for shogi
What do we want from a test set?1. As general as possible
2. Points to as many problem areas as possible
Find positions that can not be solved by the best programs
Finding weaknesses instead of measuring strength
2004/11/13 GPW2004 9
A new test set for shogiPositions selected from Shukan Shogi
Every week six next-move problemsMiddle game positions and endgame positionsDifferent tactical themes: winning material, attack, defense and matingOur goal: create a test set of 100 positions
The programs we usedAI Shogi 2003Todai Shogi 5Gekisashi 2
Conditions30 seconds on 2 GHz Pentium 4
2004/11/13 GPW2004 10
A new test set for shogi
This was not easy!More than 1500 positions needed to be checked to find our test set
Additional featureThe percentage of respondents who solved the problem is given
Differences between what is difficult for humans and difficult for computers
2004/11/13 GPW2004 11
Problem area analysisWhy are the positions difficult?
Using the analysis tools in Todai Shogi, Gekisashi and AI Shogi to find problem areas
Our first analysis indicates seven problem areasHorizon effect due to consecutive checksNot calling the tsume shogi solver deep in the search treeInaccurate evaluation functionIncorrect forward pruningMate with unpromoted piecesInsufficient hardware speedProblems with time allocation
2004/11/13 GPW2004 12
Problem area analysisHorizon effect and tsume shogi
Problem 750-3Solved: 16%
Solution2 四銀、 1 四玉(同歩、 2 三金、同玉、3 ニ角成)、 3 五金
Program repliesTodai: 1 五歩(敗勢)Gekisashi: 3 ニ角成(後手優勢)AI Shogi: 3 五金
2004/11/13 GPW2004 13
Problem area analysisHorizon effect and tsume shogi
The problemHorizon checks after 2 四銀、 1四玉、 3 五金
The same position without horizon checks can be solved by all programs
2004/11/13 GPW2004 14
Problem area analysisHorizon effect and tsume shogi
Another problem: tsume shogi deep in the search tree
Gekisashi with more time
2 四銀、 1 四玉、 3 五金、 7 九銀、同玉、2 五桂、 1 五歩、同馬、同銀(- 1192 )White has mate in 9 after 同玉 and black has a mate in 3 after 2 五桂 !
2004/11/13 GPW2004 15
Problem area analysisEvaluation and forward pruning
Problem 755-3Solved: 51%
Solution2 二金、同金、 2 三角成、 3 三金、同馬
Program repliesTodai: 2 一角成、 4 一玉、 6 一金(勝勢)Gekisashi: 6 八銀、 5 六成銀、 3 七桂、 6 六銀、2 五桂、 5 四歩、 2 一角成、 4 一玉(先手勝勢)AI Shogi: 6 八銀、 5 八成銀、 2 一角成、 4 一玉
2004/11/13 GPW2004 16
Problem area analysisEvaluation and forward pruning
The problem: an incorrect evaluationAfter 2 一角成、 4 一玉 the white king can escape, but this can not be assessed
Evaluating the chances of escaping an attack is difficult?
Another problem: forward pruningConsecutive sacrifices 2 二金 and 2 三角成Multiple sacrifices not searched deep enough?
2004/11/13 GPW2004 17
Problem area analysisUnpromoted pieces
Problem 935-2Solved: 95%
Solution1 三歩不成、 2 六銀直、( 1 四歩は反則) 1 四玉
Program repliesTodai: 5 二と(敗勢)Gekisashi:8 四桂(後手勝勢)AI Shogi: 投了 (!)
2004/11/13 GPW2004 18
Problem area analysisUnpromoted pieces
The problem here seems a special case of forward pruningPromoting a major piece or a pawn is almost always better than not promoting
Non-promotions of these pieces are pruned to improve search efficiency
Not a high priority problem, but could have consequences for thinking in opponent time
When there is no difference between promoting and non-promoting a piece, non-promoting makes thinking in opponent time useless
My advice : play the non-promotion to win some time!
2004/11/13 GPW2004 19
Problem area analysisOther problem areas
Insufficient hardware speedSome positions could be solved by giving the program more timeImproved hardware speed will automatically solve these positions
Time allocationIn some positions, the programs would play very quicklyThese positions were deleted from our test setHowever, it might be a different problem area: when to cut off the search?
2004/11/13 GPW2004 20
Problem area analysisOverview
Problem Area Positions
Insufficient hardware speed 31
Inaccurate evaluation function 20
Incorrect forward pruning 19
Horizon effect 18
Tsume shogi 11
Mate using unpromoted pieces 6
Reason unclear 7
2004/11/13 GPW2004 21
Some new results
New program versions have been releasedTodai Shogi 6 and 7, Gekisashi 3 and AI Shogi 2004
Results of Todai 6 on the test setSolved 6 positions
The problem areas of these positions was different• Inaccurate evaluation function (2 positions)
• Insufficient hardware speed (2 positions)
• Horizon effect (1 position)
• Reason unclear (1 position)
2004/11/13 GPW2004 22
Differences between humans and computers
How difficult are the positions for human players?
Almost half of the positions (46) can be solved by more than 50% of the human respondentsThere are 14 positions that can not be solved by computers, but by more than 80% of the humans
Human percentage
Positions
0 – 10% 0
11 – 20% 12
21 – 30% 18
31 – 40% 10
41 – 50% 13
51 – 60% 16
61 – 70% 7
71 – 80% 9
81 – 90% 9
91 – 100% 5
2004/11/13 GPW2004 23
Conclusions and future workWe have proposed a set of 100 positions that is general and points to specific problem areas in computer shogiAs more positions get solved, we intend to replace them with new positionsFurther investigate of the unsolved positions for which the problem could not be determinedMaking further comparisons between what is difficult for humans and difficult for computers