falcon on a cloudy day
DESCRIPTION
Falcon on a Cloudy Day. A Ro Sham Bo Algorithm by Andrew Post. Lets Review. If you missed my previous presentation: Ro Sham Bo = Rock Paper Scissors Can be more complicated though Ro Sham Bo has important applications Algorithms compete at Ro Sham Bo in tournaments - PowerPoint PPT PresentationTRANSCRIPT
Falcon on a Falcon on a Cloudy DayCloudy Day
A Ro Sham Bo AlgorithmA Ro Sham Bo Algorithm
by Andrew Postby Andrew Post
Lets ReviewLets Review
If you missed my previous presentation:If you missed my previous presentation: Ro Sham Bo = Rock Paper ScissorsRo Sham Bo = Rock Paper Scissors
Can be more complicated thoughCan be more complicated though Ro Sham Bo has important applicationsRo Sham Bo has important applications Algorithms compete at Ro Sham Bo in Algorithms compete at Ro Sham Bo in
tournamentstournaments Iocaine Powder is the world champ of Ro Iocaine Powder is the world champ of Ro
Sham BoSham Bo Because it uses ‘Sicilian Reasoning’Because it uses ‘Sicilian Reasoning’
I will beat Iocaine PowderI will beat Iocaine Powder Eventually…Eventually…
What is Ro Sham Bo?What is Ro Sham Bo?
Also known as Rock Paper ScissorsAlso known as Rock Paper Scissors
What is Ro Sham Bo?What is Ro Sham Bo?
Generalized case of Rock Paper Generalized case of Rock Paper Scissors actuallyScissors actually
Not always three choicesNot always three choices Ties can be resolved differentlyTies can be resolved differently The game is not necessarily zero-The game is not necessarily zero-
sumsum
Why does it matter?Why does it matter?
Many competitive scenarios involve a Ro Many competitive scenarios involve a Ro Sham BoSham Bo
Example:Example: CBS and NBC choosing Primetime TV ShowsCBS and NBC choosing Primetime TV Shows
They can choose to show a Drama, Comedy, or Sports They can choose to show a Drama, Comedy, or Sports showshow
Viewers prefer Comedy to Drama, Sports to Comedy, Viewers prefer Comedy to Drama, Sports to Comedy, and Drama to Sports, given the choice.and Drama to Sports, given the choice.
Neither station knows ahead of time what the other will Neither station knows ahead of time what the other will choosechoose
Billions of dollars every day rely on decisions Billions of dollars every day rely on decisions like these.like these.
How it worksHow it works
Simplest Non-Cooperative GameSimplest Non-Cooperative Game Players cannot play to ensure they both winPlayers cannot play to ensure they both win
Governed by the Nash EquilibriumGoverned by the Nash Equilibrium There are strategies which cannot be There are strategies which cannot be
dominateddominated http://www.youtube.com/watch?v=pdrBDfRhttp://www.youtube.com/watch?v=pdrBDfR
vpBAvpBA
1:31 -- 2:201:31 -- 2:20
How to WinHow to Win
As you just heard, playing randomly As you just heard, playing randomly can ensure you don’t lose, but how can ensure you don’t lose, but how do you win?do you win?
How to predict your opponentHow to predict your opponent Sub-Optimal Frequency DistributionsSub-Optimal Frequency Distributions Pattern MatchingPattern Matching History AnalysisHistory Analysis
Iocaine PowderIocaine Powder
International Ro Sham Bo International Ro Sham Bo Programming Tournament Programming Tournament ChampionChampion
Named for this famous scene:Named for this famous scene:http://youtube.com/watch?v=TUee1WvtQZUhttp://youtube.com/watch?v=TUee1WvtQZU0:57 -- 2:200:57 -- 2:20
The TournamentThe Tournament
Tournament programs play Tournament programs play thousands of roundsthousands of rounds
Win by beating the most opponents Win by beating the most opponents by a large marginby a large margin
Most programs play sub-optimally, Most programs play sub-optimally, so exploiting your opponent is more so exploiting your opponent is more important than playing randomly to important than playing randomly to avoid losing.avoid losing.
Iocaine PowderIocaine Powder
IP is the algorithm which does this IP is the algorithm which does this best.best.
IP uses the same heuristics to predict IP uses the same heuristics to predict what an opponent is most likely to do.what an opponent is most likely to do.
Using the same tools, how can you be Using the same tools, how can you be better?better?
Sicilian Reasoning!Sicilian Reasoning!
Sicilian ReasoningSicilian Reasoning
Levels of second guessing:Levels of second guessing:1.1. Opponent will play rock, so play paperOpponent will play rock, so play paper
2.2. Opponent knows you will counter rock Opponent knows you will counter rock with paper, and play scissors – so play with paper, and play scissors – so play rockrock
3.3. Opponent knows all this, and will now play Opponent knows all this, and will now play paper to beat your rock – so play scissorspaper to beat your rock – so play scissors
4.4. Opponent will play rock again – same as 1Opponent will play rock again – same as 1
Sicilian ReasoningSicilian Reasoning
Use your predictive strategies to Use your predictive strategies to evaluate what is going to happen next.evaluate what is going to happen next.
Run SR on yourself and your opponent, Run SR on yourself and your opponent, and keep a table of what each of the and keep a table of what each of the sixsix levels of reasoning say you should do.levels of reasoning say you should do.
Pick the level of reasoning which would Pick the level of reasoning which would have won against what your opponent have won against what your opponent actuallyactually diddid the most often. the most often.
Wait, six? Don’t you Wait, six? Don’t you mean three?mean three?
You can use the same predictive tools You can use the same predictive tools that your opponent uses to ‘predict’ that your opponent uses to ‘predict’ what you are going to do.what you are going to do.
Now you have three more levels of SR:Now you have three more levels of SR:4. I will play rock. So he plays paper. 4. I will play rock. So he plays paper. So play So play
ScissorsScissors
5. He knows I will counter with scissors, and 5. He knows I will counter with scissors, and play rock. play rock. So play Paper.So play Paper.
6. He expects me to counter-counter with 6. He expects me to counter-counter with paper, and will play scissors. paper, and will play scissors. So play rock.So play rock.
More Sicilian ReasoningMore Sicilian Reasoning
Just because one level of SR is Just because one level of SR is winning now, doesn’t mean it always winning now, doesn’t mean it always will be.will be.
Opponents will change how they Opponents will change how they play if they are losing, so you must play if they are losing, so you must change too!change too!
How do you switch your level of SR?How do you switch your level of SR?
Switching ReasoningSwitching Reasoning
SR-2 has just won the first 100 SR-2 has just won the first 100 roundsrounds
Opponent changes strategyOpponent changes strategy You lose 50 rounds before SR-4 has You lose 50 rounds before SR-4 has
more than 100 theoretical wins.more than 100 theoretical wins. You just wasted 50 rounds!You just wasted 50 rounds!
Switching ReasoningSwitching Reasoning
Use several different methodologies Use several different methodologies for switchesfor switches
Most wins in last 10, 25, 50, 100, 1000 Most wins in last 10, 25, 50, 100, 1000 roundsrounds
Has won the most in similar situationsHas won the most in similar situations Causes the opponent to switch to a worse Causes the opponent to switch to a worse
strategystrategy
Switching ReasoningSwitching Reasoning
Here is the real genius – now use the Here is the real genius – now use the switching methodology which has switching methodology which has helped you win the most rounds!helped you win the most rounds!
Falcon on a Cloudy DayFalcon on a Cloudy Day
So you ask, how do you beat Iocaine So you ask, how do you beat Iocaine Powder?Powder? Improve the basic predictive heuristicsImprove the basic predictive heuristics Extend Sicilian ReasoningExtend Sicilian Reasoning
Improving PredictionImproving Prediction
What I have implemented:What I have implemented: Improved Variable History AnalysisImproved Variable History Analysis
Look at just your history, your Look at just your history, your opponents, or bothopponents, or both
Improved Frequency AnalysisImproved Frequency Analysis EV[x] = Pr[x+2] - Pr[x+1] EV[x] = Pr[x+2] - Pr[x+1]
DemonstrationDemonstration
Here is how my project does with Here is how my project does with what is implemented so far.what is implemented so far.
Improving PredictionImproving Prediction
What I have not implemented yet:What I have not implemented yet: Improved Pattern MatchingImproved Pattern Matching
Markov Models with MegaHALMarkov Models with MegaHAL Extended Sicilian ReasoningExtended Sicilian Reasoning
More on MegaHALMore on MegaHAL
MegaHAL is a very simple "infinite-MegaHAL is a very simple "infinite-order" Markov model. order" Markov model.
Stores frequency information about Stores frequency information about the moves the opponent has made in the moves the opponent has made in the past for all possible contextsthe past for all possible contexts
Using the ‘context’ of the last few Using the ‘context’ of the last few moves, the “appropriate” response is moves, the “appropriate” response is then selected.then selected.
Extended Sicilian Extended Sicilian ReasoningReasoning
Q: Isn’t Sicilian Reasoning complete at 6?Q: Isn’t Sicilian Reasoning complete at 6? A: Yes, but there is information we are A: Yes, but there is information we are
ignoring.ignoring.
By compressing your strategy decisions By compressing your strategy decisions into the idea of which of six strategies is into the idea of which of six strategies is best right now, you have no way to keep best right now, you have no way to keep track of how changing your strategies has track of how changing your strategies has paid off best in the past.paid off best in the past.
Now for some MathNow for some Math
Hilbert Space Hilbert Space Game Trajectory and Game Game Trajectory and Game
StateState Projection OperatorsProjection Operators Annotated History AnalysisAnnotated History Analysis Project EnigmaProject Enigma