human-machine synergy: bringing humans and autonomy...
TRANSCRIPT
Human-machine Synergy: bringing humans and
autonomy into balance
David Garlan
11th IEEE International Conference on Self-Adaptive and Self-Organizing Systems
21 September 2017
The material presented in this talk is joint work with
Javier Camara
Ashutosh Pandey
Bradley Schmerl
Reid Simmons
Roykrong Sukkerd
… and many other students and colleagues
Research funded by NSA, Bosch Corp. US Naval Research Labs and DARPA.
© David Garlan 2017 2
Acknowledgements
Talk Synopsis
Autonomic systems arose (in part) to eliminate humans from system operation.
But completely removing humans is often not desirable nor possible.
We can build on what we know about engineering autonomic/adaptive systems to support effective human-system synergy.
© David Garlan 2017 3
? ?
Model-based coordination
Talk Outline
The need for human-in-the-loop autonomy Motivation and challenges
Rainbow as an exemplar of a (MAPE-K) autonomic system Tactics, strategies, utility, automated reasoning
Bringing humans into the loop Roles and models
Actions and explanations
A few additional ideas Ultron vs Ironman, Thinking Fast & Slow, Brain-
Computer Interaction, …
© David Garlan 2017 4
Background
One historical motivation for adaptive systems research has been to automate system oversight and repair, which otherwise would be performed by human operators.
Eliminate errors caused by humans
For many systems human error accounts for over 40% of system failures.
Reduce the cost of system ownership
Operators are expensive, often accounting for a large percent of operating cost (e.g., 60-75% of database system lifetime costs).
This has led to a control systems perspective, which replaces humans with a control layer that automatically manages the system.
© David Garlan 2017 5
MAPE-K
© David Garlan 2017 6
Managed System
Knowledge
Monitor
Analyze Plan
Execute
Effectors
Environment
Sensors
J.O. Kephart, and D.M. Chess. "The vision of autonomic computing." IEEE
Computer vol. 36, no. 1, 2003.
Example: Google File System
7
Source: “The Google File
System” Sanjay Ghemawat,
Howard Gobioff, and Shun-
Tak Leung. SOSP 2003.
© David Garlan 2017
For many systems, eliminating humans is neither possible nor desirable.
Humans can provide expertise that cannot be easily automated.
Adaptations may require physical intervention.
Control algorithm assumptions may not hold.
Humans may have information not available to the system.
Humans can detect problems that the system may not be aware of.
Humans can detect when adaptations are going badly wrong.
Some degree of oversight will always be necessary.
For social, legal, economic reasons
© David Garlan 2017 8
But …
A few examples
© David Garlan 2017 9
System Human Collaboration
Enterprise system security controls
Security specialist Deter and respond to attacks, fraud, …
Dev-ops pipeline Dev-operator Oversee continuousintegration process
Semi-autonomous car Driver Navigate
Airplane auto-pilot Pilot Fly plane
Smart home Occupant Energy, air quality, security, entertainment
Service robot Robot owner Household tasks
Medical device Patient Deliver medicine
Challenges
Can we extend our engineering paradigms for adaptive systems to incorporate humans in a principled way?
Ideally, by augmentation rather than replacement.
How can we evaluate when and how humans should be involved? How do we divide the responsibilities?
Must take into account uncertainty, variability in human capability, timing, human autonomy.
How can we improve collaboration?
Build confidence, agree on goals, correct misunderstandings, improve the human-system combination over time.
© David Garlan 2017 10
Talk Outline
The need for human-in-the-loop autonomy Motivation and challenges
Rainbow as an exemplar of a (MAPE-K) autonomic system Tactics, strategies, utility, automated reasoning
Bringing humans into the loop Roles and models
Actions and explanations
A few additional ideas Ultron vs Ironman, Thinking Fast & Slow, Brain-
Computer Interaction, …
© David Garlan 2017 11
Rainbow in a Nutshell
A framework that
Allows one to add a (MAPE-K) adaptation control layer to existing systems.
Uses dynamically updated architecture models to detect problems and reason about adaptation.
Can be tailored to specific domains.
Separates concerns through multiple extension points: probes, actuators, models, …
A language (Stitch) for programming adaptations Tactic – primitive adaptation step
Strategy – decision tree for tactic execution
© David Garlan 2017 12
© David Garlan 2017
Rainbow Framework
SystemLayer
Adaptation Layer
Target System
TranslationInfrastructure
AdaptationManager
Model Manager
StrategyExecutor
System APIProbesEffectors
Gauges
ArchitectureEvaluator
13
© David Garlan 2017
Self-Adaptation Example: Znn.com
…
Server pool
Client1
…
Clientn
Load
Balancer
WebServer 1
WebServer k
‘Net
Adaptation concerns: client request-response
time, content quality, deployment cost
Actions
-enlistServers
-dischargeServers
-restartWebServer
-lowerFidelity
-raiseFidelity
Actions
-restartLB
Latency
Load
Response-
Time
14
© David Garlan 2017
Znn.com: Rainbow Customizations
SystemLayer
Adaptation Layer
Znn.com
TranslationInfrastructure
AM
MM
SX
System APIProbes
ResourceDiscoveryEffectors
Gauges
AE
PingRTTLatencyBandwidthLoadFidelityCost
ClientT.reqRespLatencyHttpConnT.bandwidthServerT.loadServerT.fidelity ServerT.cost
ClientT.reqRespLatency <= MAX_LATENCYaddServer
removeServersetFidelity
activateServer.pldeactivateServer.plsetFidelity.pl
Model Manager MM
Architecture Evaluator AE
Adaptation Manager AM
Strategy Executor SX
15
Stitch: A Language for Specifying Self-Adaptation Strategies
Control-system model:Selection of next action in a strategy depends on observed effects of previous action.
Uncertainty: Probability of taking branch captures non-determinism in choice of action.
Asynchrony: Explicit timing delays to see impact.
Value system: Utility-based selection of best strategy allows context-sensitive adaptation.
© David Garlan 2017 16
Condition
C C
Probability
P P
Delay
D D
Impact
I I
Aggre
gate
Im
pact
Utility
Stitch: A Language for Architecture-Based Self-Adaptation. Cheng and Garlan. Journal of Systems & Software, 85(12), 2012.
Tactics and Strategies
17
Tactics define basic actions. Each affects qualities of interest in different ways.
“Add capacity” improves service quality and costs more.
“Reduce service” has the reverse tradeoff.
Strategies combine tactics into multi-step adaptation plans
Outgun can be used to handle a high-load situation
Tactic Description
Add/reduce capacity
Activate/deactivate servers to distribute the workload
Reduce/increase service
Reduce content fidelity level (e.g., text vs. images)
Strategy Description
Outgun Combines Add capacity and Reduce service
strategy Outgun
[cHiRespTime] {
t0 : (cHiRespTime) -> enlistServers(1)@[30000 /*ms*/] { // enlist server, wait 30s
t1: (sucess) -> done;
t2: (fail) -> lowerFidelity() @[2000 /*ms*/] {
t2a: (success) -> done;
t2b: (fail) -> TNULL;
}}}
Analyzing/Selecting Stitch Strategies
Predefined strategies based on human expertise. Selection based on instantaneous utility.
Off-line profiling of strategies using PRISMencoding as a DTMC. Selection based on maximizing aggregate utility.
Off-line synthesis of strategies using PRISM games. Models of system and environment as players in a game.
On-line synthesis of strategies, including timing and receding horizon planning (selection of first action after each planning cycle)
© David Garlan 2017 18
Formal Verification and Strategy Synthesis
Informal Requirements
<<a,b>> P>0.8 [F success]
<<a>>Rmax=? [F success]
Probabilistic Model Checker
Probabilistic Temporal Logic Specification
System
Stochastic Finite-state System Model
Result
Coalition Strategy
Quantitative
Results
#, %
s1
s3
s4
s6
s0
s5
s2
© David Garlan 2017 19
Talk Outline
The need for human-in-the-loop autonomy Motivation and challenges
Rainbow as an exemplar of a (MAPE-K) autonomic system Tactics, strategies, utility, automated reasoning
Bringing humans into the loop Roles and models
Actions and explanations
A few additional ideas Ultron vs Ironman, Thinking Fast & Slow, Brain-
Computer Interaction, …
© David Garlan 2017 20
© David Garlan 2017 21
How to Involve Humans?
SystemLayer
Adaptation Layer
Target System
TranslationInfrastructure
Adaptation
ManagerStrategy
Executor
System API
ProbesResourceDiscoveryEffectors
Architecture
EvaluatorActions/Decisions
Information
Insight
Effect changes
Model Manager
Gauges
Challenges for Human “Actuation”
An adequate solution requires the ability to handle the following characteristics of the problem:
Different humans with different capabilities, permissions, and roles.
Varying human attention and readiness to be involved.
Same effect may be accomplished with an automatic mechanism.
Time-scale differences
Effectiveness differences
Requires a way to determine when/how to involve the user in a given context.
© David Garlan 2017 22
Include humans as first class elements that are represented as part of the run-time knowledge on which control is based.
Requires a way to model humans.
Augment the repertoire of tactics to include those carried out by a human.
Requires a way to specify the impacts of human involvement, including uncertainty and timing.
Allow the system to involve humans through strategies that can perform these actions.
Requires a way to balance automated and human actions in strategy selection and/or synthesis.
© David Garlan - 2016 23
Approach
Reasoning about Human Participation in Self-Adaptive Systems. Cámara, Moreno, Garlan. SEAMS 2015.
Candidate Model for Human Involvement
Opportunity-Willingness-Capability Model (OWC)*
Inspiration from human-cyber design
Opportunity: Conditions of applicability for a tactic to be carried out
E.g., is a human physically located on site? Do they have access?
Capability: How likely is the human to succeed at the task
E.g., level of training, seniority, etc.
Willingness: How likely the human is to do the task
E.g., level of attention, stress, annoyance, incentives
© David Garlan 2017 24
*Eskins, Sanders: The Multiple-Asymmetric-Utility System Model: A Framework for Modeling Cyber-Human Systems.
Integration with Stitch
Some tactics are enacted by humans.
Opportunity is captured in strategy conditions.
Willingness and Capabilityaffect probabilities.
Timing captured by delay -- human tactics will likely have longer delays than automated execution.
© David Garlan 2017 25
Condition
C C
Probability
P P
Delay
D D
Impact
I I
DoS Revisited
26
Returning to our DoS example
Add two tactics carried out by human: Blackhole and Throttle
Add two dimensions of quality: user annoyance and eliminate malicious users
Strategy Description
Outgun Combines Add capacity and Reduce service
Eliminate Combines Blackholing and Throttling
Tactic Description
Add/reduce capacity:
Activate/deactivate servers to distribute the workload
Reduce/improve service
Reduce/improve content fidelity level (e.g., text vs. images)
Blackhole Blacklist clients, requests are dropped
Throttle Limit the rate of accepted requests
© David Garlan 2017
Architecture-Based Self-Protection: Composing and Reasoning about Denial-of-Service Mitigations. Schmerl, et al. 2014 Symposium & Bootcamp on the Science of Security, April 2014.
OWC Model for blackHoleAttacker -1
27
define boolean ONLNB=exists o:operatorT in M.participants | o.onLocation && !o.busy;define boolean cHiRespTime=exists c:ClientT in M.components | c.experRespTime>M.MAX_RESPTIME;
tactic blackHoleAttacker(){condition {ONLNB && cHiRespTime;}action {ao=Set.RandomSubSet({select o:operatorT in M.participants | o.onLocation && !o.busy},1);
notify(op, “Blackhole potentially malicious clients”);}effect {!cHiRespTime;}
}
Opportunity
Function
Opportunity
Elements
Opportunity
Elements OE={L,B}, where L represents the operator’s location: L.state {on location (ONL), off location (OFFL)}
and B represents whether the operator is busy: B.state {busy (OB), not busy (ONB)}
Function: fobha=(L.state==ONL) (B.state==ONB)
© David Garlan 2017
OWC Model for blackHoleAttacker -2
28
Willingness
Elements WE={S}, where S represents the operator’s stress level:
Function: fwbha=prw (S.state), with prw -> [0,1] maps
stress levels to probability of the tactic being carried out
Capability
Elements CE={T}, where T represents the operator’s level of training.
Function: fcbha=prc (T.state), with prc -> [0,1] where prc
maps training levels to probabilities of successful tactic performance.
© David Garlan 2017
Example: Strategies to Absorb/Eliminate excess traffic
strategy Outgun
[cHiRespTime] {
t0 : (cHiRespTime) -> enlistServers(1)@[30000 /*ms*/] { // enlist server, wait 30s
t1: (sucess) -> done;
t2: (fail) -> lowerFidelity() @[2000 /*ms*/] {
t2a: (success) -> done;
t2b: (fail) -> TNULL;
}
}
}
Fully automated
Under what conditions will one strategy be better than the other?
29
strategy Eliminate
[ONLNB & (unhandledMalicious || unhandledSuspicious)] {
t0: (unHandledMalicious) -> : blackHoleAttacker()@[300000 /*ms*/] { // blackhole, wait 5 min
t0a: (success) -> done;
t0ab: (unhandledSuspicious) -> throttleSuspicious()@[30000 /*ms*/] {
t1a: (success) -> done;
t1b: (fail) -> TNULL;
}
}
}
Relies on human effectors
© David Garlan 2017
Analysis results: Scenario 1 favoring elimination of malicious clients
30
Outgun vs Eliminate accrued utility
© David Garlan 2017
Analysis results: Scenario 2 – favoring optimization of user experience
31
Outgun vs Eliminate accrued utility
© David Garlan 2017
Analysis results: strategy selection (Scenario 1 – eliminate malicious clients)
32
Eliminate predominates. Human involvement useful even if training is limited, or with low level of malicious clients (20%) if training is good.
© David Garlan 2017
Analysis results: strategy selection (Scenario 2 – optimize user experience)
33
Outgun predominates. Human involvement only useful if operator has extensive training (>0.55) and malicious clients >50%.
© David Garlan 2017
Open Questions
This is a first step illustrating how formal models of humans can augment approaches to adaptive systems to support principled incorporation of humans as actuators.
Many issues are still unresolved
How can this approach be extended to the other roles?
Would there be benefits from using richer human models?
Where does the information in the models come from and how is it updated dynamically?
What about other forms of security beyond DoS?
Other domains?
Can the system proactively affect a user’s willingness and capability to improve the collaboration?
© David Garlan 2017 34
A Missing Piece of the Puzzle
Models of humans can improve the collaborative nature of adaptive systems – as we have argued.
But this addresses only half the problem: what about a user’s understanding of the system and its adaptive behavior?
Such understanding is crucial
To improve “willingness” and “capability”
To allow users to detect and correct adaptation errors
To provide missing information to the system
To support trust in a systems’ autonomous behavior
Unfortunately today’s autonomic systems are largely opaque!
© David Garlan 2017 35
Improving Transparency
Key idea: use our formal models for planning as the basis of human-understandable explanation.*
Elements of planning models that can be used for explanation:
Explicit goal for system adaptation
Explicit representation of quality dimensions and utility
Ability to explore alternative tradeoffs
Traceability from utility measures to the quality dimensions and models that contribute to it.
© David Garlan 2017 36
Task Planning of Cyber-Human Systems. Sukkerd, Garlan, Simmons. The 13th International Conf. on Software Engineering & Formal Methods, Springer LNCS 9276 2015.
Example: Service Robots
© David Garlan 2017 37
Architecture
SpeedVisionNavigation…
Map
DistanceSafetyCharging stations…
Instruction Graph
MoveSet speedSet navigation…
Power
How much energy is required to do X?
Turtlebot
© David Garlan 2017 38
L1 L4
L2 L3 L5
L6
Full speedHigh-fidelity vision
unsafe unsafe
Low battery!
Time-Safety TradeoffBattery energy constraint
Half speedLow-fidelity vision
Slower, Less safe
The Structure of an Explanation
“What am I trying to achieve?”
Goal predicate, optimization objectives, constraints.
“What did I decide to do?”
Narration of the chosen plan.
“What are the expected results and consequences of my decision?”
Expected qualities and properties of the chosen plan (objective measures).
“What are some reasonable alternatives?”
Select from a set of Pareto efficient alternatives.
“Why did I reject the other alternatives?”
Value judgement and tradeoffs (subjective measures).
© David Garlan 2017 39
Example explanation
“I’m planning to go through Corridor A to get to the target. It would take 2 minutes and it would have 0.05-probability of collision. I could reduce time to 1 minute, but at the expense of probability of collision (increase probability of collision to 0.2), by going through Corridor B instead. However, I decided not to do that because the decrease in time is not worth the increase in probability of collision.”
© David Garlan 2017 40
Plan or policyGoal predicateQuality attributes of planAlternative plansTradeoffsJustification
A Generalized Tool for Explanation
© David Garlan 2017 41
Explanation Generator
Vocabulary, templates,
etc.
• Plan• Plan quality values• Alternative plans• Alternative plans’
quality values
“I’m planning to go through Corridor A to get to the target. It would take 2 minutes and it would have 0.05-probability of collision. I could reduce time to 1 minute, but at the expense of probability of collision (increase probability of collision to 0.2), by going through Corridor B instead. However, I decided not to do that because the decrease in time is not worth the increase in probability of collision.”
Technical Challenges
Explaining a plan that computed from a probabilistic system model is not easy.
How to describe a plan that maximizes expected utility in non-mathematical ways?
There are many Pareto efficient alternatives; which ones to pick?
Focus on dimensions where there may be disagreement?
Traceability to source models from which quality attributes are derived requires additional specification
This is typically lost in most planners.
It is not obvious when to explain.
Unusual situations? When the consequence of mistake is high? When user is unwilling?
© David Garlan 2017 42
Talk Outline
The need for human-in-the-loop autonomy Motivation and challenges
Rainbow as an exemplar of a (MAPE-K) autonomic system Tactics, strategies, utility, automated reasoning
Bringing humans into the loop Roles and models
Actions and explanations
A few additional ideas Ultron vs Ironman, Thinking Fast & Slow, Brain-
Computer Interaction, …
© David Garlan 2017 43
A Few Ideas to Consider
Building on the use of models and explanations, there are a number of other interesting ideas to explore.
Collaboration philosophy
Ultron versus Ironman
Division of responsibility
Thinking Fast and Slow
New technologies for increasing bandwidth
Brain-Computer Interaction
© David Garlan 2017 44
Philosophy of Collaboration
“Automation Should Be Like Iron Man, Not Ultron”CACM, Vol. 59 No. 3, pps. 58-61.
Ultron is the ultimate robot – it can do anything (until it can’t).
Ironman is a human amplifier – it turns Tony Stark into a superman.
The “left-over” principle leads to Ultron systems in which only the hard bits are left to a human.
Humans are less engaged, and less frequently needed, so that when it comes time to solve a problem they are not prepared.
Humans must stay in the loop: engaged and learning.
© David Garlan 2017 45
Division of Responsibility
Thinking, Fast and SlowKahneman, 2011
System 1: Reactive, fast, learned
System 2: Deliberative, slow, analytical
Can we exploit this as a natural division of responsibility between human and machine?
Human uses pattern recognition; machine synthesizes plans
Machine reacts quickly to new data; human decides what to do about it
© David Garlan 2017 46
Brain-Computer Interaction
“Improving Human-in-the-Loop Adaptive Systems Using Brain-Computer Interaction”Lloyd, Huang, Tognoli. SEAMS 2017.
Brain sensing technology is getting steadily cheaper and better.
Can we improve the bandwidth between humans and systems using this technology?
Preliminary results suggest the answer is yes.
Constructed a collaborative stock investment experiment.
Showed that the system could use brain sensors to determine when to trust a human’s investment decisions.
Together they performed better than either alone.
© David Garlan 2017 47
Summary
Human models can augment adaptive systems to improve collaboration.
Explainability is an important requirement for all adaptive systems.
There are many interesting challenges to be addressed in order to achieve true human-machine synergy.
© David Garlan 2017 48
?
Model-based coordination
References - 1
The Multiple-Asymmetric-Utility System Model: A Framework for Modeling Cyber-Human Systems. D. Eskins & W. H. Sanders. Proc. of the 8th Intl. Conf. on Quantitative Evaluation of SysTems (QEST 2011). Sept 2011, pp 233-242.
Rainbow: Architecture-Based Self Adaptation with Reusable Infrastructure. D. Garlan, et al. IEEE Computer, Vol. 37(10), October 2004.
Stitch: A Language for Architecture-Based Self-Adaptation. S.W. Cheng and D. Garlan. Journal of Systems and Software,, Vol. 85(12), December 2012.
Stochastic Game Analysis and Latency Awareness for Proactive Self-Adaptation. J. Cámara, G. A. Moreno & David Garlan. In 9th Intl Conf. on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), June 2014.
© David Garlan 2017 49
References - 2
Architecture-Based Self-Protection: Composing and Reasoning about Denial-of-Service Mitigations. Schmerl, et al. In HotSoS 2014: 2014 Symposium and Bootcamp on the Science of Security, April 2014.
Reasoning about Human Participation in Self-AdaptiveSystems. Javier Cámara, Gabriel A. Moreno, David Garlan. 9th Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), Florence, Italy, May 2015.
Evaluating Trade-Offs of Human Involvement in Self-Adaptive Systems. Javier Cámara, Bradley Schmerl, Gabriel A. Moreno, David Garlan. In Managing Trade-Offs in Adaptable Software Architectures. Elsevier.
Automation Should Be Like Iron Man, Not Ultron. CACM, October 31, 2015, Volume 13, issue 8.
© David Garlan 2017 50
References 3
Task Planning of Cyber-Human Systems. Roykrong Sukkerd, David Garlan and Reid Simmons. In Proceedings of the 13th International Conference on Software Engineering and Formal Methods, Vol. 9276 of LNCS , Springer 2015.
Improving Human-in-the-Loop Adaptive Systems Using Brain-Computer Interaction. Lloyd, Huang, and Tognoli. 11th Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS 17).
© David Garlan 2017 51