a review of reinforcement learning for …shayand/slides/optimizinghumanlearning...a review of...
TRANSCRIPT
![Page 1: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/1.jpg)
Where's The Reward?Where's The Reward?A Review of Reinforcement Learning for
Instructional Sequencing
Shayan Doroudi
1
![Page 2: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/2.jpg)
2
![Page 3: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/3.jpg)
2
![Page 4: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/4.jpg)
Over the past 50 years, howOver the past 50 years, howsuccessful has RL been insuccessful has RL been in
discovering useful adaptivediscovering useful adaptiveinstructional policies?instructional policies?
Research QuestionResearch Question
3
![Page 5: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/5.jpg)
Under what conditions is RLUnder what conditions is RLmost likely to be successful inmost likely to be successful in
advancing instructionaladvancing instructionalsequencing?sequencing?
Research QuestionResearch Question
4
![Page 6: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/6.jpg)
OverviewOverview
Reinforcement Learning: Towards a "Theory of Instruction"Part 1: Historical PerspectivePart 2: Systematic Review
Discussion: Where's the Reward?
Part 3: Case StudyPlanning for the Future
5
![Page 7: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/7.jpg)
OverviewOverview
Reinforcement Learning: Towards a “Theory of Instruction”Part 1: Historical PerspectivePart 2: Systematic Review
Discussion: Where's the Reward?
Part 3: Case StudyPlanning for the Future
6
![Page 8: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/8.jpg)
Theory of InstructionTheory of InstructionAtkinson (1972):
1. The possible states of nature
2. The actions that the decision maker can take totransform the state
3. The transformation of the state of nature thatresults from each action
4. The cost of each action
5. The return resulting from each state of nature
“The derivation of an optimal strategy requires that theinstructional problem be stated in a form amenable to a
decision-theoretic analysis...”
7
![Page 9: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/9.jpg)
Markov Decision ProcessMarkov Decision ProcessA Markov Decision Process is defined as a 5-tuple (S,A,T ,R,H):
8
![Page 10: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/10.jpg)
Markov Decision ProcessMarkov Decision ProcessA Markov Decision Process is defined as a 5-tuple (S,A,T ,R,H):
1. The possible states of nature = S
8
![Page 11: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/11.jpg)
Markov Decision ProcessMarkov Decision ProcessA Markov Decision Process is defined as a 5-tuple (S,A,T ,R,H):
1. The possible states of nature = S
2. The actions that the decision maker can take totransform the state = A
8
![Page 12: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/12.jpg)
Markov Decision ProcessMarkov Decision ProcessA Markov Decision Process is defined as a 5-tuple (S,A,T ,R,H):
1. The possible states of nature = S
2. The actions that the decision maker can take totransform the state = A
3. The transformation of the state of nature that resultsfrom each action = T (s ∣s, a)′
8
![Page 13: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/13.jpg)
Markov Decision ProcessMarkov Decision ProcessA Markov Decision Process is defined as a 5-tuple (S,A,T ,R,H):
1. The possible states of nature = S
2. The actions that the decision maker can take totransform the state = A
3. The transformation of the state of nature that resultsfrom each action = T (s ∣s, a)
4. The cost of each action = R(a)
′
8
![Page 14: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/14.jpg)
Markov Decision ProcessMarkov Decision ProcessA Markov Decision Process is defined as a 5-tuple (S,A,T ,R,H):
1. The possible states of nature = S
2. The actions that the decision maker can take totransform the state = A
3. The transformation of the state of nature that resultsfrom each action = T (s ∣s, a)
4. The cost of each action = R(a)
5. The return resulting from each state of nature = R(s)
′
8
![Page 15: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/15.jpg)
Markov Decision ProcessMarkov Decision ProcessA Markov Decision Process is defined as a 5-tuple (S,A,T ,R,H):
1. The possible states of nature = S
2. The actions that the decision maker can take totransform the state = A
3. The transformation of the state of nature that resultsfrom each action = T (s ∣s, a)
4. The cost of each action = R(a)
5. The return resulting from each state of nature = R(s)
6. The horizon, or number of time steps for which theagent takes actions = H
′
8
![Page 16: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/16.jpg)
Theory of InstructionTheory of InstructionAtkinson's (1972) “Ingredients for a Theory of Instruction”:
taken in conjunction with methods for deriving optimal strategies
A model of the learning process.
Specification of admissible instructionalactions.
Specification of instructional objectives
A measurement scale that permits coststo be assigned to each of theinstructional actions and and payoffs tothe achievement of instructionalobjectives.
9
![Page 17: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/17.jpg)
Reinforcement Learning (RL)Reinforcement Learning (RL)Markov Decision Process
Set of States S
Set of Actions A Transition Matrix T
Reward function R
Horizon H
10
![Page 18: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/18.jpg)
Reinforcement Learning (RL)Reinforcement Learning (RL)Markov Decision Process
MDP Planning: methods for deriving optimal strategies (e.g., value iteration, policy iteration)
Set of States S
Set of Actions A Transition Matrix T
Reward function R
Horizon H
10
![Page 19: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/19.jpg)
Reinforcement Learning (RL)Reinforcement Learning (RL)Markov Decision Process
MDP Planning: methods for deriving optimal strategies (e.g., value iteration, policy iteration)
Set of States S
Set of Actions A Transition Matrix T
Reward function R
Horizon H
Reinforcement Learning: methods for deriving optimal strategieswhen T and R are unknown.
10
![Page 20: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/20.jpg)
Different RL SettingsDifferent RL Settings
11
![Page 21: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/21.jpg)
Different RL SettingsDifferent RL SettingsOnline RL: Learn an instructional policy as you interact with
students. (Need to balance exploration and exploitation.)
vs.
Offline RL: Learn an instructional policy using prior data.
11
![Page 22: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/22.jpg)
Different RL SettingsDifferent RL SettingsOnline RL: Learn an instructional policy as you interact with
students. (Need to balance exploration and exploitation.)
vs.
Offline RL: Learn an instructional policy using prior data.
MDP: The agent knows the state of the world
vs.
Partially observable MDP (POMDP): The agent can only observesignals of the state
(e.g., can see if the student responded correctly but does notknow the student's cognitive state)
11
![Page 23: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/23.jpg)
OverviewOverview
Reinforcement Learning: Towards a “Theory of Instruction”Part 1: Historical PerspectivePart 2: Systematic Review
Discussion: Where's the Reward?
Part 3: Case StudyPlanning for the Future
12
![Page 24: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/24.jpg)
Why History?Why History?
13
![Page 25: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/25.jpg)
Why History?Why History?
Who has been interested in using RL for instructionalsequencing and why?
13
![Page 26: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/26.jpg)
Why History?Why History?
Who has been interested in using RL for instructionalsequencing and why?History repeats itself!
13
![Page 27: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/27.jpg)
Why History?Why History?
Who has been interested in using RL for instructionalsequencing and why?History repeats itself!Surprising ways in which RL for instructional sequencinghas impacted both the field of reinforcement learning andthe field of education.
13
![Page 28: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/28.jpg)
Why History?Why History?
Who has been interested in using RL for instructionalsequencing and why?History repeats itself!Surprising ways in which RL for instructional sequencinghas impacted both the field of reinforcement learning andthe field of education.A lot of the literature does not acknowledge the history ofthis area.
13
![Page 29: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/29.jpg)
First Wave: 1960s-70sFirst Wave: 1960s-70s
14
![Page 30: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/30.jpg)
First Wave: 1960s-70sFirst Wave: 1960s-70s
Why 1960s?
14
![Page 31: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/31.jpg)
First Wave: 1960s-70sFirst Wave: 1960s-70s
Teaching machines were popular in late 50s-early 60s.
Why 1960s?
14
![Page 32: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/32.jpg)
First Wave: 1960s-70sFirst Wave: 1960s-70s
Teaching machines were popular in late 50s-early 60s.Computers! -> Computer-Assisted Instruction
Why 1960s?
14
![Page 33: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/33.jpg)
First Wave: 1960s-70sFirst Wave: 1960s-70s
Teaching machines were popular in late 50s-early 60s.Computers! -> Computer-Assisted InstructionDynamic Programming and Markov Decision Processes
Why 1960s?
14
![Page 34: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/34.jpg)
First Wave: 1960s-70sFirst Wave: 1960s-70s
Teaching machines were popular in late 50s-early 60s.Computers! -> Computer-Assisted InstructionDynamic Programming and Markov Decision ProcessesMathematical Psych: studying mathematical models of learning
Why 1960s?
14
![Page 35: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/35.jpg)
Ronald Howard
15
![Page 36: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/36.jpg)
Ronald Howard
Richard Smallwood
A Decision Structure for Teaching Machines
15
![Page 37: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/37.jpg)
Ronald Howard
Richard Smallwood
Edward Sondik
A Decision Structure for Teaching Machines
The Optimal Control of Partially Observable Markov Processes
15
![Page 38: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/38.jpg)
Ronald Howard
Richard Smallwood
Edward Sondik
“The results obtained by Smallwood [on the special case of determiningoptimum teaching strategies] prompted this research into the general
problem.”
A Decision Structure for Teaching Machines
The Optimal Control of Partially Observable Markov Processes
15
![Page 39: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/39.jpg)
Ronald Howard
Richard Smallwood
Edward Sondik16
![Page 40: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/40.jpg)
Ronald Howard
Richard Smallwood
Edward Sondik
Operations Research / Engineering
16
![Page 41: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/41.jpg)
Ronald Howard
Richard Smallwood
Edward Sondik
Richard Atkinson Patrick Suppes
Operations Research / Engineering Mathematical Psychology / CAI
16
![Page 42: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/42.jpg)
Ronald Howard
Richard Smallwood
Edward Sondik
Richard Atkinson Patrick Suppes
James Matheson
William Linvill
Operations Research / Engineering Mathematical Psychology / CAI
16
![Page 43: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/43.jpg)
Ronald Howard
Richard Smallwood
Edward Sondik
Richard Atkinson Patrick Suppes
James Matheson
William Linvill
Operations Research / Engineering Mathematical Psychology / CAI
Optimum Teaching Procedures Derived fromMathematical Learning Models
16
![Page 44: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/44.jpg)
The Dark AgesThe Dark Ages
c. 1972 - 2000sc. 1972 - 2000s
17
![Page 45: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/45.jpg)
The Dark AgesThe Dark Ages
c. 1972 - 2000sc. 1972 - 2000sBy 1970s - Howard, Smallwood, Matheson et al. goback to operations research (sans education)
17
![Page 46: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/46.jpg)
The Dark AgesThe Dark Ages
c. 1972 - 2000sc. 1972 - 2000sBy 1970s - Howard, Smallwood, Matheson et al. goback to operations research (sans education)1975 - Atkinson leaves research (for administrativepositions)
17
![Page 47: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/47.jpg)
“The mathematical techniques of optimizationused in theories of instruction draw upon a
wealth of results from other areas of science,especially from tools developed in
mathematical economics and operationsresearch over the past two decades, and itwould be my prediction that we will seeincreasingly sophisticated theories of
instruction in the near future.”
Suppes (1974)
The Place of Theory in Educational Research
AERA Presidential Address
18
![Page 48: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/48.jpg)
“The mathematical techniques of optimizationused in theories of instruction draw upon a
wealth of results from other areas of science,especially from tools developed in
mathematical economics and operationsresearch over the past two decades, and itwould be my prediction that we will seeincreasingly sophisticated theories of
instruction in the near future.”
Suppes (1974)
The Place of Theory in Educational Research
AERA Presidential Address
Atkinson (2014)“work [on MOOCs] is promising, but the key to success isindividualizing instruction, and necessarily that requires a
psychological theory of the learning process”
18
![Page 49: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/49.jpg)
Second Wave: 2000sSecond Wave: 2000s
Why 2000s?
19
![Page 50: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/50.jpg)
Second Wave: 2000sSecond Wave: 2000s
Intelligent Tutoring Systems
Why 2000s?
19
![Page 51: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/51.jpg)
Second Wave: 2000sSecond Wave: 2000s
Intelligent Tutoring SystemsReinforcement Learning formed as a field
Why 2000s?
19
![Page 52: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/52.jpg)
Second Wave: 2000sSecond Wave: 2000s
Intelligent Tutoring SystemsReinforcement Learning formed as a fieldAIED/EDM: studying statistical models of learning
Why 2000s?
19
![Page 53: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/53.jpg)
Second Wave: 2000sSecond Wave: 2000s
Intelligent Tutoring SystemsReinforcement Learning formed as a fieldAIED/EDM: studying statistical models of learning
Why 2000s?
Parallels 1960s
19
![Page 54: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/54.jpg)
Second Wave: 2000sSecond Wave: 2000s
Intelligent Tutoring SystemsReinforcement Learning formed as a fieldAIED/EDM: studying statistical models of learning
Teaching machines and Computer-Assisted InstructionDynamic Programming and Markov Decision ProcessesMathematical Psych: studying mathematical models of learning
Why 2000s?
Parallels 1960s
19
![Page 55: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/55.jpg)
Reinforcement Learning AI in Education / ITS
20
![Page 56: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/56.jpg)
Reinforcement Learning AI in Education / ITS
Andrew Barto Beverly Woolf
Joe Beck
20
![Page 57: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/57.jpg)
Reinforcement Learning AI in Education / ITS
Andrew Barto
Balaraman Ravindran
Beverly Woolf
Joe Beck
20
![Page 58: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/58.jpg)
Emma Brunskill Vincent Aleven
Reinforcement Learning AI in Education / ITS
Shayan Doroudi
21
![Page 59: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/59.jpg)
The Third Wave:The Third Wave: What Lies in the HorizonWhat Lies in the Horizon
Why 2010s?
22
![Page 60: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/60.jpg)
The Third Wave:The Third Wave: What Lies in the HorizonWhat Lies in the Horizon
Massive Open Online Courses (MOOCs)
Why 2010s?
22
![Page 61: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/61.jpg)
The Third Wave:The Third Wave: What Lies in the HorizonWhat Lies in the Horizon
Massive Open Online Courses (MOOCs)Deep Reinforcement Learning formed as a field
Why 2010s?
22
![Page 62: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/62.jpg)
The Third Wave:The Third Wave: What Lies in the HorizonWhat Lies in the Horizon
Massive Open Online Courses (MOOCs)Deep Reinforcement Learning formed as a fieldDeep Learning: building deep models of learning
Why 2010s?
22
![Page 63: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/63.jpg)
The Third Wave:The Third Wave: What Lies in the HorizonWhat Lies in the Horizon
Massive Open Online Courses (MOOCs)Deep Reinforcement Learning formed as a fieldDeep Learning: building deep models of learning
35% increase in papers/books mentioning“reinforcement learning” from 2016 to 2017
(Google Scholar)
Why 2010s?
22
![Page 64: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/64.jpg)
Three Waves: SummaryThree Waves: Summary
First Wave (1960s-70s)
Second Wave (2000s-2010s)
Third Wave (2010s)
Medium ofInstruction
TeachingMachines / CAI
IntelligentTutoring Systems
Massive OpenOnline Courses
OptimizationModels
DecisionProcesses
ReinforcementLearning
Deep RL
Models ofLearning
MathematicalPsychology
Machine Learning AIED/EDM
Deep Learning
23
![Page 65: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/65.jpg)
Three Waves: SummaryThree Waves: Summary
First Wave (1960s-70s)
Second Wave (2000s-2010s)
Third Wave (2010s)
Medium ofInstruction
TeachingMachines / CAI
IntelligentTutoring Systems
Massive OpenOnline Courses
OptimizationModels
DecisionProcesses
ReinforcementLearning
Deep RL
Models ofLearning
MathematicalPsychology
Machine Learning AIED/EDM
Deep Learning
More data-driven
23
![Page 66: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/66.jpg)
Three Waves: SummaryThree Waves: Summary
First Wave (1960s-70s)
Second Wave (2000s-2010s)
Third Wave (2010s)
Medium ofInstruction
TeachingMachines / CAI
IntelligentTutoring Systems
Massive OpenOnline Courses
OptimizationModels
DecisionProcesses
ReinforcementLearning
Deep RL
Models ofLearning
MathematicalPsychology
Machine Learning AIED/EDM
Deep Learning
More data-driven
More data-generating
23
![Page 67: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/67.jpg)
OverviewOverview
Reinforcement Learning: Towards a “Theory of Instruction”Part 1: Historical PerspectivePart 2: Systematic Review
Discussion: Where's the Reward?
Part 3: Case StudyPlanning for the Future
24
![Page 68: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/68.jpg)
Inclusion CriteriaInclusion CriteriaWe consider any papers where:
25
![Page 69: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/69.jpg)
Inclusion CriteriaInclusion CriteriaWe consider any papers where:
There is (implicitly) a model of the learning process, wheredifferent instructional actions probabilistically change the stateof a student.
25
![Page 70: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/70.jpg)
Inclusion CriteriaInclusion CriteriaWe consider any papers where:
There is (implicitly) a model of the learning process, wheredifferent instructional actions probabilistically change the stateof a student.There is an instructional policy that maps past observations froma student (e.g., responses to questions) to instructional actions.
25
![Page 71: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/71.jpg)
Inclusion CriteriaInclusion CriteriaWe consider any papers where:
There is (implicitly) a model of the learning process, wheredifferent instructional actions probabilistically change the stateof a student.There is an instructional policy that maps past observations froma student (e.g., responses to questions) to instructional actions.Data collected from students are used to learn either:
the modelan adaptive policy
25
![Page 72: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/72.jpg)
Inclusion CriteriaInclusion CriteriaWe consider any papers where:
There is (implicitly) a model of the learning process, wheredifferent instructional actions probabilistically change the stateof a student.There is an instructional policy that maps past observations froma student (e.g., responses to questions) to instructional actions.Data collected from students are used to learn either:
the modelan adaptive policy
If the model is learned, the instructional policy is designed to(approximately) optimize that model according to some rewardfunction
25
![Page 73: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/73.jpg)
What's Not Included?What's Not Included?
26
![Page 74: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/74.jpg)
What's Not Included?What's Not Included?
Adaptive policies that use hand-made or heuristic decision rules(rather than data-driven/optimized decision rules)
26
![Page 75: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/75.jpg)
What's Not Included?What's Not Included?
Adaptive policies that use hand-made or heuristic decision rules(rather than data-driven/optimized decision rules)Experiments that do not control for everything other thansequence of instruction
26
![Page 76: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/76.jpg)
What's Not Included?What's Not Included?
Adaptive policies that use hand-made or heuristic decision rules(rather than data-driven/optimized decision rules)Experiments that do not control for everything other thansequence of instructionMachine teaching experiments
26
![Page 77: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/77.jpg)
What's Not Included?What's Not Included?
Adaptive policies that use hand-made or heuristic decision rules(rather than data-driven/optimized decision rules)Experiments that do not control for everything other thansequence of instructionMachine teaching experimentsExperiments that use RL for other educational purposes, such as:
generating data-driven hints (Stamper et al., 2013) orgiving feedback (Rafferty et al., 2015)
26
![Page 78: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/78.jpg)
Review OverviewReview Overview
27 studies empirically compare adaptive policy to baseline
27
![Page 79: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/79.jpg)
Review OverviewReview Overview
27 studies empirically compare adaptive policy to baseline
≥ 10 papers compare policies learned with student data insimulation
27
![Page 80: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/80.jpg)
Review OverviewReview Overview
27 studies empirically compare adaptive policy to baseline
≥ 10 papers compare policies learned with student data insimulation
≥ 16 papers build policies only on simulated data
27
![Page 81: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/81.jpg)
Review OverviewReview Overview
27 studies empirically compare adaptive policy to baseline
≥ 10 papers compare policies learned with student data insimulation
≥ 16 papers build policies only on simulated data
≥ 7 papers that propose using RL for instructional sequencing
27
![Page 82: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/82.jpg)
Review OverviewReview Overview
27 studies empirically compare adaptive policy to baseline
≥ 10 papers compare policies learned with student data insimulation
≥ 16 papers build policies only on simulated data
≥ 7 papers that propose using RL for instructional sequencing
≥ 3 other papers with policies used on real students
27
![Page 83: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/83.jpg)
Review OverviewReview Overview
Among papers with empirical comparisons:
14 found sig difference between adaptive policy and baseline
28
![Page 84: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/84.jpg)
Review OverviewReview Overview
Among papers with empirical comparisons:
14 found sig difference between adaptive policy and baseline
2 found sig aptitude-treatment interaction
Policy is sig better for below median learners
28
![Page 85: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/85.jpg)
Review OverviewReview Overview
Among papers with empirical comparisons:
14 found sig difference between adaptive policy and baseline
2 found sig aptitude-treatment interaction
Policy is sig better for below median learners
2 found sig difference between adaptive policy and some but not all baselines
28
![Page 86: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/86.jpg)
Review OverviewReview Overview
Among papers with empirical comparisons:
14 found sig difference between adaptive policy and baseline
2 found sig aptitude-treatment interaction
Policy is sig better for below median learners
2 found sig difference between adaptive policy and some but not all baselines
9 found no sig difference between policies
28
![Page 87: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/87.jpg)
Studies by YearStudies by Year
29
![Page 88: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/88.jpg)
Review SummaryReview Summary
30
![Page 89: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/89.jpg)
OverviewOverview
Reinforcement Learning: Towards a “Theory of Instruction”Part 1: Historical PerspectivePart 2: Systematic Review
Discussion: Where's the Reward?
Part 3: Case StudyPlanning for the Future
31
![Page 90: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/90.jpg)
Where's the Reward?Where's the Reward?The Pessimistic Story
Studies with sig difference were often constrained:
32
![Page 91: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/91.jpg)
Where's the Reward?Where's the Reward?The Pessimistic Story
Studies with sig difference were often constrained:
7 of them only compare to random policy or other RL-induced policy
32
![Page 92: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/92.jpg)
Where's the Reward?Where's the Reward?The Pessimistic Story
Studies with sig difference were often constrained:
7 of them only compare to random policy or other RL-induced policy
9 of them were on paired-association tasks or conceptlearning tasks
Decent psychological understanding of how humans learn
32
![Page 93: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/93.jpg)
Where's the Reward?Where's the Reward?The Pessimistic Story
Studies with sig difference were often constrained:
7 of them only compare to random policy or other RL-induced policy
9 of them were on paired-association tasks or conceptlearning tasks
Decent psychological understanding of how humans learn
2 of the studies (+ 2 ATI studies) sequenced activity typesrather than content
32
![Page 94: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/94.jpg)
Where's the Reward?Where's the Reward?The Pessimistic Story
Studies with sig difference were often constrained:
7 of them only compare to random policy or other RL-induced policy
9 of them were on paired-association tasks or conceptlearning tasks
Decent psychological understanding of how humans learn
2 of the studies (+ 2 ATI studies) sequenced activity typesrather than content
2 of the studies did not optimize for learning
32
![Page 95: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/95.jpg)
Where's the Reward?Where's the Reward?The Pessimistic Story
Studies with sig difference were often constrained:
7 of them only compare to random policy or other RL-induced policy
9 of them were on paired-association tasks or conceptlearning tasks
Decent psychological understanding of how humans learn
2 of the studies (+ 2 ATI studies) sequenced activity typesrather than content
2 of the studies did not optimize for learning
1 study seems to have been “lucky”
32
![Page 96: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/96.jpg)
Among papers without sig difference:
Where's the Reward?Where's the Reward?The Pessimistic Story
33
![Page 97: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/97.jpg)
Among papers without sig difference:
Only 3 of them only compare to random policy or other RL-induced policy
Where's the Reward?Where's the Reward?The Pessimistic Story
33
![Page 98: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/98.jpg)
Among papers without sig difference:
Only 3 of them only compare to random policy or other RL-induced policy
Only 3 of them were on paired-association or concept learningtasks
Where's the Reward?Where's the Reward?The Pessimistic Story
33
![Page 99: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/99.jpg)
Among papers without sig difference:
Only 3 of them only compare to random policy or other RL-induced policy
Only 3 of them were on paired-association or concept learningtasks
Only 2 of them sequenced activity types rather than content.
Where's the Reward?Where's the Reward?The Pessimistic Story
33
![Page 100: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/100.jpg)
Among papers without sig difference:
Only 3 of them only compare to random policy or other RL-induced policy
Only 3 of them were on paired-association or concept learningtasks
Only 2 of them sequenced activity types rather than content.
Papers that showed no sig. difference were generallymore complex and ambitious in a number of dimensions
Where's the Reward?Where's the Reward?The Pessimistic Story
33
![Page 101: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/101.jpg)
Among papers with sig difference:
9 of them use models inspired by cognitive psychology.
The policies that were successful for paired-associationtasks tended to use more psychologically plausible modelsthan those that were not successful.
Where's the Reward?Where's the Reward?The Optimistic Story
34
![Page 102: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/102.jpg)
Among papers with sig difference:
9 of them use models inspired by cognitive psychology.
The policies that were successful for paired-associationtasks tended to use more psychologically plausible modelsthan those that were not successful.
Several use some sort of clever offline policy selection (e.g.,importance sampling or robust evaluation)
Where's the Reward?Where's the Reward?The Optimistic Story
34
![Page 103: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/103.jpg)
OverviewOverview
Reinforcement Learning: Towards a “Theory of Instruction”Part 1: Historical PerspectivePart 2: Systematic Review
Discussion: Where's the Reward?
Part 3: Case StudyPlanning for the Future
35
![Page 104: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/104.jpg)
Case StudyCase Study
Fractions Tutor
36
![Page 105: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/105.jpg)
Case StudyCase Study
Fractions TutorTwo experiments testing RL-induced policies(both no sig difference)
36
![Page 106: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/106.jpg)
Case StudyCase Study
Fractions TutorTwo experiments testing RL-induced policies(both no sig difference)Off-policy policy evaluation
36
![Page 107: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/107.jpg)
Fractions TutorFractions Tutor
37
![Page 108: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/108.jpg)
Experiment 1Experiment 1
Used prior data to fit G-SCOPE Model (Hallak et al., 2015).
38
![Page 109: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/109.jpg)
Experiment 1Experiment 1
Used prior data to fit G-SCOPE Model (Hallak et al., 2015).
Used G-SCOPE Model to derive two new Adaptive Policies.
38
![Page 110: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/110.jpg)
Experiment 1Experiment 1
Used prior data to fit G-SCOPE Model (Hallak et al., 2015).
Used G-SCOPE Model to derive two new Adaptive Policies.
Wanted to compare Adaptive Policies to a Baseline Policy(fixed, spiraling curriculum).
38
![Page 111: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/111.jpg)
Experiment 1Experiment 1
Used prior data to fit G-SCOPE Model (Hallak et al., 2015).
Used G-SCOPE Model to derive two new Adaptive Policies.
Wanted to compare Adaptive Policies to a Baseline Policy(fixed, spiraling curriculum).
Simulated both policies on G-SCOPE Model to predictposttest scores (out of 16 points).
38
![Page 112: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/112.jpg)
Experiment 1:Experiment 1: Policy EvaluationPolicy Evaluation
Baseline Adaptive Policy
Simulated Posttest 5.9 ± 0.9 9.1 ± 0.8
Doroudi, Aleven, and Brunskill, L@S 201739 . 1
![Page 113: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/113.jpg)
Baseline Adaptive Policy
Simulated Posttest 5.9 ± 0.9 9.1 ± 0.8
Actual Posttest 5.5 ± 2.6 4.9 ± 2.6
Doroudi, Aleven, and Brunskill, L@S 2017
Experiment 1:Experiment 1: Policy EvaluationPolicy Evaluation
39 . 2
![Page 114: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/114.jpg)
Single Model SimulationSingle Model Simulation
Used by Chi, VanLehn, Littman, and Jordan (2011) andRowe, Mott, and Lester (2014) in educational settings.
40
![Page 115: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/115.jpg)
Single Model SimulationSingle Model Simulation
Used by Chi, VanLehn, Littman, and Jordan (2011) andRowe, Mott, and Lester (2014) in educational settings.Rowe, Mott, and Lester (2014): New adaptive policyestimated to be much better than random policy.
40
![Page 116: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/116.jpg)
Single Model SimulationSingle Model Simulation
Used by Chi, VanLehn, Littman, and Jordan (2011) andRowe, Mott, and Lester (2014) in educational settings.Rowe, Mott, and Lester (2014): New adaptive policyestimated to be much better than random policy.But in experiment, no significant difference found(Rowe and Lester, 2015).
40
![Page 117: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/117.jpg)
Importance SamplingImportance Sampling
Estimator that gives unbiased and consistent estimatesfor a policy!
41
![Page 118: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/118.jpg)
Importance SamplingImportance Sampling
Estimator that gives unbiased and consistent estimatesfor a policy!Can have very high variance when policy is differentfrom prior data.
41
![Page 119: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/119.jpg)
Importance SamplingImportance Sampling
Estimator that gives unbiased and consistent estimatesfor a policy!Can have very high variance when policy is differentfrom prior data.Example: Worked example or problem-solving?
41
![Page 120: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/120.jpg)
Importance SamplingImportance Sampling
Estimator that gives unbiased and consistent estimatesfor a policy!Can have very high variance when policy is differentfrom prior data.Example: Worked example or problem-solving?
20 sequential decisions ⇒ need over 2 students20
41
![Page 121: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/121.jpg)
Importance SamplingImportance Sampling
Estimator that gives unbiased and consistent estimatesfor a policy!Can have very high variance when policy is differentfrom prior data.Example: Worked example or problem-solving?
20 sequential decisions ⇒ need over 2 students50 sequential decisions ⇒ need over 2 students!
20
50
41
![Page 122: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/122.jpg)
Importance SamplingImportance Sampling
Estimator that gives unbiased and consistent estimatesfor a policy!Can have very high variance when policy is differentfrom prior data.Example: Worked example or problem-solving?
20 sequential decisions ⇒ need over 2 students50 sequential decisions ⇒ need over 2 students!
Importance sampling can prefer the worse of twopolicies more often than not (Doroudi et al., 2017b).
20
50
Doroudi, Thomas, and Brunskill, UAI 2017, Best Paper41
![Page 123: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/123.jpg)
Robust Evaluation MatrixRobust Evaluation Matrix
Policy 1 Policy 2 Policy 3
Student Model 1
Student Model 2
Student Model 3
VSM ,P1 1
VSM ,P2 1
VSM ,P3 1
VSM ,P1 2
VSM ,P2 2
VSM ,P3 2
VSM ,P1 3
VSM ,P2 3
VSM ,P3 3
42
![Page 124: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/124.jpg)
Robust Evaluation MatrixRobust Evaluation Matrix
Baseline AdaptivePolicy
G-SCOPE Model 5.9 ± 0.9 9.1 ± 0.8
Doroudi, Aleven, and Brunskill, L@S 201743 . 1
![Page 125: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/125.jpg)
Robust Evaluation MatrixRobust Evaluation Matrix
Baseline AdaptivePolicy
G-SCOPE Model 5.9 ± 0.9 9.1 ± 0.8
Bayesian Knowledge Tracing 6.5 ± 0.8 7.0 ± 1.0
Doroudi, Aleven, and Brunskill, L@S 201743 . 2
![Page 126: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/126.jpg)
Robust Evaluation MatrixRobust Evaluation Matrix
Baseline AdaptivePolicy
G-SCOPE Model 5.9 ± 0.9 9.1 ± 0.8
Bayesian Knowledge Tracing 6.5 ± 0.8 7.0 ± 1.0
Deep Knowledge Tracing 9.9 ± 1.5 8.6 ± 2.1
Doroudi, Aleven, and Brunskill, L@S 201743 . 3
![Page 127: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/127.jpg)
Robust Evaluation MatrixRobust Evaluation Matrix
Baseline AdaptivePolicy
AwesomePolicy
G-SCOPE Model 5.9 ± 0.9 9.1 ± 0.8 16
Bayesian Knowledge Tracing 6.5 ± 0.8 7.0 ± 1.0 16
Deep Knowledge Tracing 9.9 ± 1.5 8.6 ± 2.1 16
Doroudi, Aleven, and Brunskill, L@S 201743 . 4
![Page 128: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/128.jpg)
Experiment 2Experiment 2
Used Robust Evaluation Matrix to test new policies
44
![Page 129: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/129.jpg)
Experiment 2Experiment 2
Used Robust Evaluation Matrix to test new policiesFound that a New Adaptive Policy that was very simplebut robustly expected to do well:
44
![Page 130: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/130.jpg)
Experiment 2Experiment 2
Used Robust Evaluation Matrix to test new policiesFound that a New Adaptive Policy that was very simplebut robustly expected to do well:
sequence problems in increasing order of avg. time
44
![Page 131: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/131.jpg)
Experiment 2Experiment 2
Used Robust Evaluation Matrix to test new policiesFound that a New Adaptive Policy that was very simplebut robustly expected to do well:
sequence problems in increasing order of avg. timeskip any problems where students havedemonstrated mastery of all skills (according to BKT)
44
![Page 132: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/132.jpg)
Experiment 2Experiment 2
Used Robust Evaluation Matrix to test new policiesFound that a New Adaptive Policy that was very simplebut robustly expected to do well:
sequence problems in increasing order of avg. timeskip any problems where students havedemonstrated mastery of all skills (according to BKT)
Ran an experiment testing New Adaptive Policy
44
![Page 133: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/133.jpg)
Experiment 2Experiment 2
Baseline New AdaptivePolicy
Actual Posttest 8.12 ± 2.9 7.97 ± 2.7
45
![Page 134: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/134.jpg)
Experiment 2:Experiment 2: InsightsInsights
Even though we did robust evaluation, twothings were not considered adequately:
46
![Page 135: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/135.jpg)
Experiment 2:Experiment 2: InsightsInsights
Even though we did robust evaluation, twothings were not considered adequately:
How long each problem takes per student
46
![Page 136: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/136.jpg)
Experiment 2:Experiment 2: InsightsInsights
Even though we did robust evaluation, twothings were not considered adequately:
How long each problem takes per student
Student population mismatch
46
![Page 137: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/137.jpg)
Experiment 2:Experiment 2: InsightsInsights
Even though we did robust evaluation, twothings were not considered adequately:
How long each problem takes per student
Student population mismatch
Robust evaluation can help usidentify where our models are lacking
and lead to building better modelsover time.
46
![Page 138: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/138.jpg)
OverviewOverview
Reinforcement Learning: Towards a "Theory of Instruction"Part 1: Historical PerspectivePart 2: Systematic Review
Discussion: Where's the Reward?
Part 3: Case Study: Fractions Tutor and Policy SelectionPlanning for the Future
47
![Page 139: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/139.jpg)
Planning for the FuturePlanning for the FutureData-Driven + Theory-Driven Approach
48
![Page 140: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/140.jpg)
Planning for the FuturePlanning for the FutureData-Driven + Theory-Driven Approach
Reinforcement learning researchers should work withlearning scientists and psychologists.
48
![Page 141: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/141.jpg)
Planning for the FuturePlanning for the FutureData-Driven + Theory-Driven Approach
Reinforcement learning researchers should work withlearning scientists and psychologists.
Work on domains where we have or can develop decentcognitive models.
48
![Page 142: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/142.jpg)
Planning for the FuturePlanning for the FutureData-Driven + Theory-Driven Approach
Reinforcement learning researchers should work withlearning scientists and psychologists.
Work on domains where we have or can develop decentcognitive models.
Work in settings where the set of actions is restrictedbut that are still meaningful (e.g., worked examples vs. problem solving)
48
![Page 143: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/143.jpg)
Planning for the FuturePlanning for the FutureData-Driven + Theory-Driven Approach
Reinforcement learning researchers should work withlearning scientists and psychologists.
Work on domains where we have or can develop decentcognitive models.
Work in settings where the set of actions is restrictedbut that are still meaningful (e.g., worked examples vs. problem solving)
Compare to good baselines based on learning sciences(e.g., expertise reversal effect)
48
![Page 144: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/144.jpg)
Planning for the FuturePlanning for the FutureData-Driven + Theory-Driven Approach
Reinforcement learning researchers should work withlearning scientists and psychologists.
Work on domains where we have or can develop decentcognitive models.
Work in settings where the set of actions is restrictedbut that are still meaningful (e.g., worked examples vs. problem solving)
Compare to good baselines based on learning sciences(e.g., expertise reversal effect)
Do thoughtful and extensive offline evaluations.
48
![Page 145: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/145.jpg)
Planning for the FuturePlanning for the FutureData-Driven + Theory-Driven Approach
Reinforcement learning researchers should work withlearning scientists and psychologists.
Work on domains where we have or can develop decentcognitive models.
Work in settings where the set of actions is restrictedbut that are still meaningful (e.g., worked examples vs. problem solving)
Compare to good baselines based on learning sciences(e.g., expertise reversal effect)
Do thoughtful and extensive offline evaluations.
Iterate and replicate! Develop theories of instruction thatcan help us see where the reward might be.
48
![Page 146: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/146.jpg)
Is Data-Driven Sufficient?Is Data-Driven Sufficient?Might we see a revolution in data-driven instructionalsequencing?
49
![Page 147: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/147.jpg)
Is Data-Driven Sufficient?Is Data-Driven Sufficient?Might we see a revolution in data-driven instructionalsequencing?
More data
49
![Page 148: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/148.jpg)
Is Data-Driven Sufficient?Is Data-Driven Sufficient?Might we see a revolution in data-driven instructionalsequencing?
More dataMore computational power
49
![Page 149: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/149.jpg)
Is Data-Driven Sufficient?Is Data-Driven Sufficient?Might we see a revolution in data-driven instructionalsequencing?
More dataMore computational powerBetter RL algorithms
49
![Page 150: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/150.jpg)
Is Data-Driven Sufficient?Is Data-Driven Sufficient?Might we see a revolution in data-driven instructionalsequencing?
More dataMore computational powerBetter RL algorithms
Similar advances have recently revolutionized the fields ofcomputer vision, natural language processing, andcomputational game-playing.
49
![Page 151: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/151.jpg)
Is Data-Driven Sufficient?Is Data-Driven Sufficient?Might we see a revolution in data-driven instructionalsequencing?
More dataMore computational powerBetter RL algorithms
Similar advances have recently revolutionized the fields ofcomputer vision, natural language processing, andcomputational game-playing.Why not instruction?
49
![Page 152: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/152.jpg)
Is Data-Driven Sufficient?Is Data-Driven Sufficient?Might we see a revolution in data-driven instructionalsequencing?
More dataMore computational powerBetter RL algorithms
Similar advances have recently revolutionized the fields ofcomputer vision, natural language processing, andcomputational game-playing.Why not instruction?
Learning is fundamentally different from images,language, and games.
49
![Page 153: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/153.jpg)
Is Data-Driven Sufficient?Is Data-Driven Sufficient?Might we see a revolution in data-driven instructionalsequencing?
More dataMore computational powerBetter RL algorithms
Similar advances have recently revolutionized the fields ofcomputer vision, natural language processing, andcomputational game-playing.Why not instruction?
Learning is fundamentally different from images,language, and games.Baselines are much stronger for instructional sequencing.
49
![Page 154: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/154.jpg)
So, where is the reward?So, where is the reward?
In the coming years, will likely see both purely data-driven(deep learning) approaches as well as theory+data-drivenapproaches to instructional sequencing.
50
![Page 155: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/155.jpg)
So, where is the reward?So, where is the reward?
In the coming years, will likely see both purely data-driven(deep learning) approaches as well as theory+data-drivenapproaches to instructional sequencing.
Only time can tell where the reward lies, but our robustevaluation suggests combining theory and data.
50
![Page 156: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/156.jpg)
So, where is the reward?So, where is the reward?
In the coming years, will likely see both purely data-driven(deep learning) approaches as well as theory+data-drivenapproaches to instructional sequencing.
Only time can tell where the reward lies, but our robustevaluation suggests combining theory and data.By reviewing the history and prior empirical literature, wecan have a better sense of the terrain we are operating in.
50
![Page 157: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/157.jpg)
So, where is the reward?So, where is the reward?
Applying RL to instructional sequencing has beenrewarding in other ways:
51
![Page 158: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/158.jpg)
So, where is the reward?So, where is the reward?
Applying RL to instructional sequencing has beenrewarding in other ways:
Advances have been made to the field of RL.
51
![Page 159: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/159.jpg)
So, where is the reward?So, where is the reward?
Applying RL to instructional sequencing has beenrewarding in other ways:
Advances have been made to the field of RL.
The Optimal Control of Partially Observable Markov Processes
51
![Page 160: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/160.jpg)
So, where is the reward?So, where is the reward?
Applying RL to instructional sequencing has beenrewarding in other ways:
Advances have been made to the field of RL.
The Optimal Control of Partially Observable Markov Processes
Our work on importance sampling (Doroudi et al., 2017b)
51
![Page 161: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/161.jpg)
So, where is the reward?So, where is the reward?
Applying RL to instructional sequencing has beenrewarding in other ways:
Advances have been made to the field of RL.
The Optimal Control of Partially Observable Markov Processes
Our work on importance sampling (Doroudi et al., 2017b)
Advances have been made to student modeling.
51
![Page 162: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/162.jpg)
So, where is the reward?So, where is the reward?
Applying RL to instructional sequencing has beenrewarding in other ways:
Advances have been made to the field of RL.
The Optimal Control of Partially Observable Markov Processes
Our work on importance sampling (Doroudi et al., 2017b)
Advances have been made to student modeling.
By continuing to try to optimizeinstruction, we will likely continue toexpand the frontiers of the study of
human and machine learning.
51
![Page 163: A Review of Reinforcement Learning for …shayand/slides/optimizinghumanlearning...A Review of Reinforcement Learning for Instructional Sequencing Shayan Doroudi 1 2 2 Over the past](https://reader035.vdocuments.us/reader035/viewer/2022070715/5ed7a2ec21f2f81ba73da284/html5/thumbnails/163.jpg)
AcknowledgementsAcknowledgements
The research reported here was supported, in whole or in part, bythe Institute of Education Sciences, U.S. Department of Education,
through Grants R305A130215 and R305B150008 to Carnegie MellonUniversity. The opinions expressed are those of the authors and donot represent views of the Institute or the U.S. Dept. of Education.
This research was done in collaboration with Vincent Aleven,Emma Brunskill, Kenneth Holstein, and Philip Thomas.
52