theory and practice of arti cial intelligence - css...
TRANSCRIPT
![Page 1: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/1.jpg)
Theory and Practice of Artificial IntelligenceFurther Games
Daniel Polani
School of Computer ScienceUniversity of Hertfordshire
March 9, 2017
All rights reserved. Permission is granted to copy and distribute these slides in full or in part for purposes ofresearch, education as well as private use, provided that author, affiliation and this notice is retained.Some external illustrations may be copyrighted and are included here under “fair use” for educational
illustration only.Use as part of home- and coursework is only allowed with express permission by the responsible tutor and, in
this case, is to be appropriately referenced.
Theory and Practice of Artificial Intelligence 53 / 150
![Page 2: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/2.jpg)
Obligatory XKCD
https://xkcd.com/1002/ (CC BY-NC 2.5)
Theory and Practice of Artificial Intelligence 54 / 150
![Page 3: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/3.jpg)
UCT Monte Carlo Tree Search I
one of the great breakthroughs in game AIs
based on exploration/exploitation tradeoffs regret (Auer 2003)
generalized to trees (Kocsis and Szepesvari 2006)
Note: do not have the time for the full theory
just sketch the method
Theory and Practice of Artificial Intelligence 55 / 150
![Page 4: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/4.jpg)
UCT Monte Carlo Tree Search II(Browne 2012; Browne et al. 2012; Bradberry 2015)
Outset: consider an already expanded partial treeassume every node contains a
sum of rewards ∑ Vi hitherto collected fromnodes beneath itnumber of runs n that went through that node
for now, just a search, will generalize to games later
Theory and Practice of Artificial Intelligence 56 / 150
![Page 5: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/5.jpg)
UCT Monte Carlo Tree Search III(Browne 2012; Browne et al. 2012; Bradberry 2015)
Theory and Practice of Artificial Intelligence 57 / 150
![Page 6: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/6.jpg)
UCT Monte Carlo Tree Search IV(Browne 2012; Browne et al. 2012; Bradberry 2015)
∑i Vi, n
Theory and Practice of Artificial Intelligence 57 / 150
![Page 7: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/7.jpg)
UCT Monte Carlo Tree Search V(Browne 2012; Browne et al. 2012; Bradberry 2015)
V, n
Theory and Practice of Artificial Intelligence 57 / 150
![Page 8: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/8.jpg)
UCT Monte Carlo Tree Search VI(Browne 2012; Browne et al. 2012; Bradberry 2015)
n
Theory and Practice of Artificial Intelligence 57 / 150
![Page 9: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/9.jpg)
UCT Monte Carlo Tree Search VII(Browne 2012; Browne et al. 2012; Bradberry 2015)
n
urgency
Theory and Practice of Artificial Intelligence 57 / 150
![Page 10: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/10.jpg)
UCT Monte Carlo Tree Search VIII(Browne 2012; Browne et al. 2012; Bradberry 2015)
n
urgency urgency
Theory and Practice of Artificial Intelligence 57 / 150
![Page 11: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/11.jpg)
UCT Monte Carlo Tree Search IX(Browne 2012; Browne et al. 2012; Bradberry 2015)
n
urgency urgencyurgency
Theory and Practice of Artificial Intelligence 57 / 150
![Page 12: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/12.jpg)
UCT Monte Carlo Tree Search X(Browne 2012; Browne et al. 2012; Bradberry 2015)
n
urgency mosturgent urgency
Theory and Practice of Artificial Intelligence 57 / 150
![Page 13: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/13.jpg)
UCT Monte Carlo Tree Search XI(Browne 2012; Browne et al. 2012; Bradberry 2015)
n
urgency mosturgent urgency
Vj + C√
2 ln nnj
select
Theory and Practice of Artificial Intelligence 57 / 150
![Page 14: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/14.jpg)
UCT Monte Carlo Tree Search XII(Browne 2012; Browne et al. 2012; Bradberry 2015)
n
urgency urgency
Vj + C√
2 ln nnj
Vj, nj
select
Theory and Practice of Artificial Intelligence 57 / 150
![Page 15: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/15.jpg)
UCT Monte Carlo Tree Search XIII(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
select
Theory and Practice of Artificial Intelligence 57 / 150
![Page 16: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/16.jpg)
UCT Monte Carlo Tree Search XIV(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
select
Theory and Practice of Artificial Intelligence 57 / 150
![Page 17: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/17.jpg)
UCT Monte Carlo Tree Search XV(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl
Theory and Practice of Artificial Intelligence 57 / 150
![Page 18: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/18.jpg)
UCT Monte Carlo Tree Search XVI(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nm
Theory and Practice of Artificial Intelligence 57 / 150
![Page 19: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/19.jpg)
UCT Monte Carlo Tree Search XVII(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nm
select
Theory and Practice of Artificial Intelligence 57 / 150
![Page 20: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/20.jpg)
UCT Monte Carlo Tree Search XVIII(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vm, nmVterm,nl + 1terminal
Theory and Practice of Artificial Intelligence 57 / 150
![Page 21: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/21.jpg)
UCT Monte Carlo Tree Search XIX(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nmnot terminal
Theory and Practice of Artificial Intelligence 57 / 150
![Page 22: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/22.jpg)
UCT Monte Carlo Tree Search XX(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nmexpand
Theory and Practice of Artificial Intelligence 57 / 150
![Page 23: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/23.jpg)
UCT Monte Carlo Tree Search XXI(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nmexpand
unexploredchild
Theory and Practice of Artificial Intelligence 57 / 150
![Page 24: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/24.jpg)
UCT Monte Carlo Tree Search XXII(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nmexpand
Theory and Practice of Artificial Intelligence 57 / 150
![Page 25: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/25.jpg)
UCT Monte Carlo Tree Search XXIII(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nm
simulate
Theory and Practice of Artificial Intelligence 57 / 150
![Page 26: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/26.jpg)
UCT Monte Carlo Tree Search XXIV(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nm
terminal
Vterm
simulate
Theory and Practice of Artificial Intelligence 57 / 150
![Page 27: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/27.jpg)
UCT Monte Carlo Tree Search XXV(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nm
Vterm, 1
simulate
Theory and Practice of Artificial Intelligence 57 / 150
![Page 28: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/28.jpg)
UCT Monte Carlo Tree Search XXVI(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nm
Vterm, 1backpropagate
Theory and Practice of Artificial Intelligence 57 / 150
![Page 29: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/29.jpg)
UCT Monte Carlo Tree Search XXVII(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nm
Vterm, 1
Vterm, 1
backpropagate
Theory and Practice of Artificial Intelligence 57 / 150
![Page 30: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/30.jpg)
UCT Monte Carlo Tree Search XXVIII(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nm
Vterm, 1
Vterm, 1
+1
backpropagate
Theory and Practice of Artificial Intelligence 57 / 150
![Page 31: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/31.jpg)
UCT Monte Carlo Tree Search XXIX(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nm
Vterm, 1
Vterm, 1
+1
+1
backpropagate
Theory and Practice of Artificial Intelligence 57 / 150
![Page 32: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/32.jpg)
UCT Monte Carlo Tree Search XXX(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nm
Vterm, 1
Vterm, 1
+1
+1
+1
backpropagate
Theory and Practice of Artificial Intelligence 57 / 150
![Page 33: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/33.jpg)
UCT Monte Carlo Tree Search XXXI(Browne 2012; Browne et al. 2012; Bradberry 2015)
nVj + C
√2 ln n
nj
Vj, njurgency urgency
Vk, nk
Vl , nl Vm, nm
Vterm, 1
Vterm, 1
+1
+1
+1
+1
backpropagate
Theory and Practice of Artificial Intelligence 57 / 150
![Page 34: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/34.jpg)
Summary
1 select
2 expand
3 simulate
4 backpropagate
Theory and Practice of Artificial Intelligence 58 / 150
![Page 35: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/35.jpg)
Summary
1 select
2 expand
3 simulate
4 backpropagate
select
Theory and Practice of Artificial Intelligence 58 / 150
![Page 36: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/36.jpg)
Summary
1 select
2 expand
3 simulate
4 backpropagate
expand
Theory and Practice of Artificial Intelligence 58 / 150
![Page 37: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/37.jpg)
Summary
1 select
2 expand
3 simulate
4 backpropagate
simulate
Theory and Practice of Artificial Intelligence 58 / 150
![Page 38: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/38.jpg)
Summary
1 select
2 expand
3 simulate
4 backpropagate
backpropagate
Theory and Practice of Artificial Intelligence 58 / 150
![Page 39: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/39.jpg)
Additional Comments
Note: we treated it as a puzzle problemrewards just positive
But: in a game, antagonistic situationeither: use NEG-MAX pictureturn reward around at each step (multiply by −1for each level)(Browne 2012)
or: have utility for the player of the particularincremented if they won the game
Theory and Practice of Artificial Intelligence 59 / 150
![Page 40: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/40.jpg)
Mystery Factor: Urgency
Confidence Bound
consider a sequence of random rewards (value payoffs)with mean Vit is not perfectly accuratefrom Hoeffding’s inequality (google it if you dare!), one getsthat the true mean is “with good probability” in an interval
[Vj −
√2 ln n
nj, Vj +
√2 ln n
nj
]
if option j is visited nj times and n total runs have been madeit can be shown that selecting the branch with highest upperconfidence bound (UCB)
Vj +
√2 ln n
nj
minimizes regret asymptotically
(Auer 2003; Kocsis and Szepesvari 2006)
Theory and Practice of Artificial Intelligence 60 / 150
![Page 41: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/41.jpg)
Criterion(Browne 2012; Browne et al. 2012)
reward Vj
upper confidence boundVj +
√2 ln n
nj
highest UCB
not highest reward
not widest spread
Vj +√
2 ln nnj
Theory and Practice of Artificial Intelligence 61 / 150
![Page 42: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/42.jpg)
Criterion(Browne 2012; Browne et al. 2012)
highest UCB
not highest reward
not widest spread
Vj +√
2 ln nnj
Theory and Practice of Artificial Intelligence 61 / 150
![Page 43: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/43.jpg)
Criterion(Browne 2012; Browne et al. 2012)
highest UCB
not highest reward
not widest spread
Vj +√
2 ln nnj
Theory and Practice of Artificial Intelligence 61 / 150
![Page 44: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/44.jpg)
Criterion(Browne 2012; Browne et al. 2012)
highest UCB
not highest reward
not widest spread
Vj +√
2 ln nnj
Theory and Practice of Artificial Intelligence 61 / 150
![Page 45: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/45.jpg)
Criterion(Browne 2012; Browne et al. 2012)
highest UCB
not highest reward
not widest spread
Vj +√
2 ln nnj
Theory and Practice of Artificial Intelligence 61 / 150
![Page 46: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/46.jpg)
Criterion(Browne 2012; Browne et al. 2012)
select highest UCB
highest UCB
not highest reward
not widest spread
Vj +√
2 ln nnj
Theory and Practice of Artificial Intelligence 61 / 150
![Page 47: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/47.jpg)
Criterion(Browne 2012; Browne et al. 2012)
select highest UCB highest UCB
not highest reward
not widest spread
Vj +√
2 ln nnj
Theory and Practice of Artificial Intelligence 61 / 150
![Page 48: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/48.jpg)
Criterion(Browne 2012; Browne et al. 2012)
select highest UCB highest UCB
not highest reward
not widest spread
Vj +√
2 ln nnj
Theory and Practice of Artificial Intelligence 61 / 150
![Page 49: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/49.jpg)
Criterion(Browne 2012; Browne et al. 2012)
select highest UCB highest UCB
not highest reward
not widest spread
Vj +√
2 ln nnj
Theory and Practice of Artificial Intelligence 61 / 150
![Page 50: Theory and Practice of Arti cial Intelligence - CSS Homepageshomepages.herts.ac.uk/~comqdp1/AI_2/Lecture_4.pdf · Theory and Practice of Arti cial Intelligence Further Games Daniel](https://reader030.vdocuments.us/reader030/viewer/2022040622/5d1e25fb88c99302498daf9f/html5/thumbnails/50.jpg)
UCT Pseudocode(Browne 2012)
The above was taken directly from Cameron Browne slides.
Theory and Practice of Artificial Intelligence 62 / 150