evaluation of option and constraint generation using work domain analysis guliz tokadli jesse...
TRANSCRIPT
![Page 1: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/1.jpg)
Evaluation of Option and Constraint Generation using Work
Domain Analysis
Guliz Tokadli
Jesse Rosalia
Dr. Karen Feigh
![Page 2: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/2.jpg)
2
Interactive Machine LearningProject Overview – Human FactorsBackground
Reinforcement LearningWork Domain AnalysisMicro-world of Pac-Man
Pac-Man StudyFamiliarization & Interview PhaseModeling PhaseOption & Constraint Set Creation PhaseEvaluation Phase
![Page 3: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/3.jpg)
Interactive Machine Learning
Our Objective: Robots and other machines that can learn from people unfamiliar with machine learning algorithms.
3
![Page 4: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/4.jpg)
4
Project OverviewGoal: To develop a method to exploit experienced humans’
knowledge to improve algorithms for machine learning using work domain analysis techniques from cognitive engineering
Current practice: Using programmer derived or auto-derived primitives, options and constraints, or inferring these from human coaching or demonstration
How is our approach new: Provides a systematic way to mine a human’s knowledge about a domain and to translate it to a hierarchical goal structure
How will we know we’ve succeeded: We will be able to generate better machine learning algorithms than current practice in a shorter period of time with less expert intervention
![Page 5: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/5.jpg)
5
Background
![Page 6: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/6.jpg)
6
Reinforcement Learning
A method to generate policies for agent tasked with making decisions.
Learning from action policy: s a It maximizes the reward
![Page 7: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/7.jpg)
7
Option & Constraint – Hierarchical RL
(5) A. J. Irani, J. A. Rosalia, K. Subramanian, C. L. I. Jr., and A. L. Thomaz, “Using both positive and negative instructions from humans to decompose reinforcement learning problems,” Georgia Institute of Technology, Tech. Rep., 2014. (6) R.S.Sutton and A.G.Barto, Reinforcement Learning: An Introduction. The MIT Press, 2012.
Reinforcement Learning Algorithm Structure
Current RL Structure Our Approach: RL Structure
![Page 8: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/8.jpg)
Constraints
c : State x Action
Corresponds to common negative advice forms:
“don’t do X”“don’t climb on the counter”“don’t move onto an non-scared
ghost”
8
![Page 9: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/9.jpg)
Options
Primitive actions are the set of fundamental actions the agent can effect: ”Up”, ”Down”, ”Left” and ”Right”.
In addition to primitive actions, make use of temporally extended actions, Options
An option O: <I, π, β>I: set of states from which o can be initiatedπ(s,a): State x Action [0,1] (Policy)β: State [0,1]; termination condition
Sutton, Precup & Singh, 1999
9
![Page 10: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/10.jpg)
10
Work Domain Analysis Method: Abstraction Hierarchy5-level functional decomposition used for
modelling complex sociotechnical systems. System is described at different levels of
abstraction with “Why - How” relationship.
(1)N. Naikar, R. Hopcroft, and A. Moylan, “Work domain analysis : Theoretical concepts and methodology,” pp. 5–35, 2005.(2)N. Naikar, Work Domain Analysis: Concepts, Guidelines, and Cases. CRC Press, 2013.(3)J. Rasmussen, Information Processing and Human-Machine Interaction: An Approach to Cognitive Engineering. Elsevier Science, 1986.(4) N.A. Stanton, P.M. Salmon, G.H. Walker, C. Baber and D.P. Jenkins, Human Factors Methods, Ch. 4 Cognitive Task Analysis Methods, 2005
![Page 11: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/11.jpg)
11
Work Domain Analysis Method: Abstraction Hierarchy
Levels of AH Explanation of Levels for Pac-Man Examples for Pac-Man
Functional Purposes (FP)
The goals of Pac-Man. Staying Alive
Abstract Functions (AF)
What criteria is required to judge whether Pac-Man is achieving the purposes.
Avoid Ghosts
Generalized Functions (GF)
What functions are required to accomplish Pac-Man’s abstract functions.
Maneuver Pac-Man
Physical Functions (PF)
The limitation and capabilities of the system,
Distance to Nearest Ghost
Physical Objects (PO)
The objects and characters on the maze. Ghosts, Pac-Man, Maze Walls
(1)N. Naikar, R. Hopcroft, and A. Moylan, “Work domain analysis : Theoretical concepts and methodology,” pp. 5–35, 2005.(2)N. Naikar, Work Domain Analysis: Concepts, Guidelines, and Cases. CRC Press, 2013.(3)J. Rasmussen, Information Processing and Human-Machine Interaction: An Approach to Cognitive Engineering. Elsevier Science, 1986.(4) N.A. Stanton, P.M. Salmon, G.H. Walker, C. Baber and D.P. Jenkins, Human Factors Methods, Ch. 4 Cognitive Task Analysis Methods, 2005
![Page 12: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/12.jpg)
12
Micro-World of Pac-Man
Classic arcade game, invented in 1980 with the name as “Puck-man”
Why we chose Pac-Man:Helps to understand the research problem Helps to build AH
Map linking the goals Provide insight into RL policies
Used for many ML and RL studies Allow the comparison of the results Find what is needed quickly and conveniently.
![Page 13: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/13.jpg)
13
Pac-Man StudyFamiliarization Phase: 16 participants played Pac-Man for 10 minutes
Interview Phase: researchers interviewed the participants using a structured interview script designed to generate abstraction hierarchies
Modeling Phase: researchers created abstraction hierarchies for each player and composite abstraction hierarchies for high and low performing players
Algorithm Phase: researchers translated composite abstraction hierarchies into sets of options and constraints
Testing Phase: researchers evaluated the option and constraint sets on three different board sizes
![Page 14: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/14.jpg)
14
Familiarization Phase
Interview Phase
Modeling Phase
Algorithm Phase
Testing Phase
![Page 15: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/15.jpg)
15
Participant Performance
High Performers
Low Performers
![Page 16: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/16.jpg)
16
Familiarization Phase
Interview Phase
Modeling Phase
Algorithm Phase
Testing Phase
![Page 17: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/17.jpg)
17
Modeling Phase - Procedure
1. Creation of individual AH per player
2. Harmonization of statements in Ahs
3. Combination of AHs of high and low performers Performance-based AH
![Page 18: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/18.jpg)
18
WDA: Performance-Based AH
Aggregated AH: Low and High Performance AHs are represented as single AH.
Low PerformersHigh Performers
![Page 19: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/19.jpg)
19
WDA: Performance-Based AH
High Performers Low Performers
Player perspective is as ‘Competitor’ – Maximum score and minimum time -
Player perspective is as ‘Exploratory spirit’ – Learning Tricks -
Having both defensive and offensive actionsHaving dual level of protection for Pac-Man as proactive and reactive strategy
Having only defensive actions
‘Clearing current quadrant’ as action is using as tactic
‘Clearing current quadrant’ is using as strategy
Quick observation to act in minimum time interval
Observation on quadrant is being used for ‘Learning Tricks’
![Page 20: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/20.jpg)
20
Familiarization Phase
Interview Phase
Modeling Phase
Algorithm Phase
Testing Phase
![Page 21: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/21.jpg)
21
Generation of Option and Constraint Sets
(5) A. J. Irani, J. A. Rosalia, K. Subramanian, C. L. I. Jr., and A. L. Thomaz, “Using both positive and negative instructions from humans to decompose reinforcement learning problems,” Georgia Institute of Technology, Tech. Rep., 2014. (6) R.S.Sutton and A.G.Barto, Reinforcement Learning: An Introduction. The MIT Press, 2012.
High Performer-Defined OC Set
Low Performer-Defined OC Set
![Page 22: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/22.jpg)
22
High Performer-Defined OC Set Creation
![Page 23: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/23.jpg)
23
High Performer-Defined Option & Constraint Set
Option Name Policy
eatClosestFood Eat the closest food Pac-Man can reach
eatClosestPowerPellet Eat the closest power pellet Pac-Man can reach
eatScaredGhost Eat the closest scared ghosts Pac-Man can reach, if there exists a scared ghost in the game.
goClosestQuadrant Take the shortest path to the nearest position that leaves the current quadrant
Constraint Name Restriction
avoidUnscaredGhost Do not allow actions that could result in walking into an unscared ghost
eatAllQuadrantFood Do not allow actions that would leave quadrant while it still contains reachable food (food that can be reached by walking through the current quadrant)
![Page 24: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/24.jpg)
24
Low Performer-Defined OC Set Creation
![Page 25: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/25.jpg)
25
Low Performer-Defined Option & Constraint Set
Option Name Policy
eatClosestFood Eat the closest food Pac-Man can reach
goClosestQuadrant Take the shortest path to the nearest position that leaves the current quadrant
runAwayFromGhost Move to maximize distance from all ghosts
Constraint Name Restriction
avoidUnscaredGhost Do not allow actions that could result in walking into an unscared ghost
avoidGhostQuadrant Do not allow actions that would enter or remain in quadrants containing an unscared ghost
![Page 26: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/26.jpg)
26
Researcher Defined Option & Constraint Set
Option Name Policy
eatClosestFood Eat the closest food Pac-Man can reach
eatClosestPowerPellet Eat the closest power pellet Pac-Man can reach
eatClosestGhost Attempt to eat the closest unscared or scared ghost
Constraint Name Restriction
avoidUnscaredGhost Do not allow Pac-Man to occupy positions that are 2 or fewer squares from an unscared ghost
![Page 27: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/27.jpg)
27
Familiarization Phase
Interview Phase
Modeling Phase
Algorithm Phase
Testing Phase
![Page 28: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/28.jpg)
28
Simulation Setup
Implemented Policies for each Option & Constraint Set
Q-learning with constrained options from Irani et al.
Experiments200 independent trials10,000 training episodesScores were averaged over all 200 trials Discount of 0.9, epsilon of 0.4 0 over episodes
![Page 29: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/29.jpg)
30
Results of Option-Constraint Set Testing
0 2000 4000 6000 8000 10000
-400
-200
0
200
400
600
800
Cumulative Number of Learning Episodes
Re
wa
rd
prd-oclp-ochp-oc
High Performer OC
Researcher OC
Low Performer OC
Primitive
Performance (High to Low): High performers > Researcher defined agent > Low performers
![Page 30: Evaluation of Option and Constraint Generation using Work Domain Analysis Guliz Tokadli Jesse Rosalia Dr. Karen Feigh](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bfeb1a28abf838cb7c08/html5/thumbnails/30.jpg)
31
Thank you!Any Questions?