human-machine synergy: bringing humans and autonomy...

Human-machine Synergy: bringing humans and

autonomy into balance

David Garlan

11th IEEE International Conference on Self-Adaptive and Self-Organizing Systems

21 September 2017

The material presented in this talk is joint work with

Javier Camara

Ashutosh Pandey

Bradley Schmerl

Reid Simmons

Roykrong Sukkerd

… and many other students and colleagues

Research funded by NSA, Bosch Corp. US Naval Research Labs and DARPA.

© David Garlan 2017 2

Acknowledgements

Talk Synopsis

Autonomic systems arose (in part) to eliminate humans from system operation.

But completely removing humans is often not desirable nor possible.

We can build on what we know about engineering autonomic/adaptive systems to support effective human-system synergy.


? ?

Model-based coordination

Talk Outline

The need for human-in-the-loop autonomy Motivation and challenges

Rainbow as an exemplar of a (MAPE-K) autonomic system Tactics, strategies, utility, automated reasoning

Bringing humans into the loop Roles and models

Actions and explanations

A few additional ideas Ultron vs Ironman, Thinking Fast & Slow, Brain-

Computer Interaction, …


Background

One historical motivation for adaptive systems research has been to automate system oversight and repair, which otherwise would be performed by human operators.

Eliminate errors caused by humans

For many systems human error accounts for over 40% of system failures.

Reduce the cost of system ownership

Operators are expensive, often accounting for a large percent of operating cost (e.g., 60-75% of database system lifetime costs).

This has led to a control systems perspective, which replaces humans with a control layer that automatically manages the system.


MAPE-K


Managed System

Knowledge

Monitor

Analyze Plan

Execute

Effectors

Environment

Sensors

J.O. Kephart, and D.M. Chess. "The vision of autonomic computing." IEEE

Computer vol. 36, no. 1, 2003.

Example: Google File System

7

Source: “The Google File

System” Sanjay Ghemawat,

Howard Gobioff, and Shun-

Tak Leung. SOSP 2003.

© David Garlan 2017

For many systems, eliminating humans is neither possible nor desirable.

Humans can provide expertise that cannot be easily automated.

Adaptations may require physical intervention.

Control algorithm assumptions may not hold.

Humans may have information not available to the system.

Humans can detect problems that the system may not be aware of.

Humans can detect when adaptations are going badly wrong.

Some degree of oversight will always be necessary.

For social, legal, economic reasons


But …

A few examples


System Human Collaboration

Enterprise system security controls

Security specialist Deter and respond to attacks, fraud, …

Dev-ops pipeline Dev-operator Oversee continuousintegration process

Semi-autonomous car Driver Navigate

Airplane auto-pilot Pilot Fly plane

Smart home Occupant Energy, air quality, security, entertainment

Service robot Robot owner Household tasks

Medical device Patient Deliver medicine

Challenges

Can we extend our engineering paradigms for adaptive systems to incorporate humans in a principled way?

Ideally, by augmentation rather than replacement.

How can we evaluate when and how humans should be involved? How do we divide the responsibilities?

Must take into account uncertainty, variability in human capability, timing, human autonomy.

How can we improve collaboration?

Build confidence, agree on goals, correct misunderstandings, improve the human-system combination over time.


Talk Outline








Rainbow in a Nutshell

A framework that

Allows one to add a (MAPE-K) adaptation control layer to existing systems.

Uses dynamically updated architecture models to detect problems and reason about adaptation.

Can be tailored to specific domains.

Separates concerns through multiple extension points: probes, actuators, models, …

A language (Stitch) for programming adaptations Tactic – primitive adaptation step

Strategy – decision tree for tactic execution



Rainbow Framework

SystemLayer

Adaptation Layer

Target System

TranslationInfrastructure

AdaptationManager

Model Manager

StrategyExecutor

System APIProbesEffectors

Gauges

ArchitectureEvaluator

13


Self-Adaptation Example: Znn.com

…

Server pool

Client1

…

Clientn

Load

Balancer

WebServer 1

WebServer k

‘Net

Adaptation concerns: client request-response

time, content quality, deployment cost

Actions

-enlistServers

-dischargeServers

-restartWebServer

-lowerFidelity

-raiseFidelity

Actions

-restartLB

Latency

Load

Response-

Time

14


Znn.com: Rainbow Customizations

SystemLayer

Adaptation Layer

Znn.com


AM

MM

SX

System APIProbes

ResourceDiscoveryEffectors

Gauges

AE

PingRTTLatencyBandwidthLoadFidelityCost

ClientT.reqRespLatencyHttpConnT.bandwidthServerT.loadServerT.fidelity ServerT.cost

ClientT.reqRespLatency <= MAX_LATENCYaddServer

removeServersetFidelity

activateServer.pldeactivateServer.plsetFidelity.pl

Model Manager MM

Architecture Evaluator AE

Adaptation Manager AM

Strategy Executor SX

15

Stitch: A Language for Specifying Self-Adaptation Strategies

Control-system model:Selection of next action in a strategy depends on observed effects of previous action.

Uncertainty: Probability of taking branch captures non-determinism in choice of action.

Asynchrony: Explicit timing delays to see impact.

Value system: Utility-based selection of best strategy allows context-sensitive adaptation.


Condition

C C

Probability

P P

Delay

D D

Impact

I I

Aggre

gate

Im

pact

Utility

Stitch: A Language for Architecture-Based Self-Adaptation. Cheng and Garlan. Journal of Systems & Software, 85(12), 2012.

Tactics and Strategies

17

Tactics define basic actions. Each affects qualities of interest in different ways.

“Add capacity” improves service quality and costs more.

“Reduce service” has the reverse tradeoff.

Strategies combine tactics into multi-step adaptation plans

Outgun can be used to handle a high-load situation

Tactic Description

Add/reduce capacity

Activate/deactivate servers to distribute the workload

Reduce/increase service

Reduce content fidelity level (e.g., text vs. images)

Strategy Description

Outgun Combines Add capacity and Reduce service

strategy Outgun

[cHiRespTime] {

t0 : (cHiRespTime) -> enlistServers(1)@[30000 /*ms*/] { // enlist server, wait 30s

t1: (sucess) -> done;

t2: (fail) -> lowerFidelity() @[2000 /*ms*/] {

t2a: (success) -> done;

t2b: (fail) -> TNULL;

}}}

Analyzing/Selecting Stitch Strategies

Predefined strategies based on human expertise. Selection based on instantaneous utility.

Off-line profiling of strategies using PRISMencoding as a DTMC. Selection based on maximizing aggregate utility.

Off-line synthesis of strategies using PRISM games. Models of system and environment as players in a game.

On-line synthesis of strategies, including timing and receding horizon planning (selection of first action after each planning cycle)


Formal Verification and Strategy Synthesis

Informal Requirements

<<a,b>> P>0.8 [F success]

<<a>>Rmax=? [F success]

Probabilistic Model Checker

Probabilistic Temporal Logic Specification

System

Stochastic Finite-state System Model

Result

Coalition Strategy

Quantitative

Results

#, %

s1

s3

s4

s6

s0

s5

s2


Talk Outline









How to Involve Humans?

SystemLayer

Adaptation Layer

Target System


Adaptation

ManagerStrategy

Executor

System API

ProbesResourceDiscoveryEffectors

Architecture

EvaluatorActions/Decisions

Information

Insight

Effect changes

Model Manager

Gauges

Challenges for Human “Actuation”

An adequate solution requires the ability to handle the following characteristics of the problem:

Different humans with different capabilities, permissions, and roles.

Varying human attention and readiness to be involved.

Same effect may be accomplished with an automatic mechanism.

Time-scale differences

Effectiveness differences

Requires a way to determine when/how to involve the user in a given context.


Include humans as first class elements that are represented as part of the run-time knowledge on which control is based.

Requires a way to model humans.

Augment the repertoire of tactics to include those carried out by a human.

Requires a way to specify the impacts of human involvement, including uncertainty and timing.

Allow the system to involve humans through strategies that can perform these actions.

Requires a way to balance automated and human actions in strategy selection and/or synthesis.

© David Garlan - 2016 23

Approach

Reasoning about Human Participation in Self-Adaptive Systems. Cámara, Moreno, Garlan. SEAMS 2015.

Candidate Model for Human Involvement

Opportunity-Willingness-Capability Model (OWC)*

Inspiration from human-cyber design

Opportunity: Conditions of applicability for a tactic to be carried out

E.g., is a human physically located on site? Do they have access?

Capability: How likely is the human to succeed at the task

E.g., level of training, seniority, etc.

Willingness: How likely the human is to do the task

E.g., level of attention, stress, annoyance, incentives


*Eskins, Sanders: The Multiple-Asymmetric-Utility System Model: A Framework for Modeling Cyber-Human Systems.

Integration with Stitch

Some tactics are enacted by humans.

Opportunity is captured in strategy conditions.

Willingness and Capabilityaffect probabilities.

Timing captured by delay -- human tactics will likely have longer delays than automated execution.


Condition

C C

Probability

P P

Delay

D D

Impact

I I

DoS Revisited

26

Returning to our DoS example

Add two tactics carried out by human: Blackhole and Throttle

Add two dimensions of quality: user annoyance and eliminate malicious users

Strategy Description

Outgun Combines Add capacity and Reduce service

Eliminate Combines Blackholing and Throttling

Tactic Description

Add/reduce capacity:

Activate/deactivate servers to distribute the workload

Reduce/improve service

Reduce/improve content fidelity level (e.g., text vs. images)

Blackhole Blacklist clients, requests are dropped

Throttle Limit the rate of accepted requests


Architecture-Based Self-Protection: Composing and Reasoning about Denial-of-Service Mitigations. Schmerl, et al. 2014 Symposium & Bootcamp on the Science of Security, April 2014.

OWC Model for blackHoleAttacker -1

27

define boolean ONLNB=exists o:operatorT in M.participants | o.onLocation && !o.busy;define boolean cHiRespTime=exists c:ClientT in M.components | c.experRespTime>M.MAX_RESPTIME;

tactic blackHoleAttacker(){condition {ONLNB && cHiRespTime;}action {ao=Set.RandomSubSet({select o:operatorT in M.participants | o.onLocation && !o.busy},1);

notify(op, “Blackhole potentially malicious clients”);}effect {!cHiRespTime;}

}

Opportunity

Function

Opportunity

Elements

Opportunity

Elements OE={L,B}, where L represents the operator’s location: L.state {on location (ONL), off location (OFFL)}

and B represents whether the operator is busy: B.state {busy (OB), not busy (ONB)}

Function: fobha=(L.state==ONL) (B.state==ONB)


OWC Model for blackHoleAttacker -2

28

Willingness

Elements WE={S}, where S represents the operator’s stress level:

Function: fwbha=prw (S.state), with prw -> [0,1] maps

stress levels to probability of the tactic being carried out

Capability

Elements CE={T}, where T represents the operator’s level of training.

Function: fcbha=prc (T.state), with prc -> [0,1] where prc

maps training levels to probabilities of successful tactic performance.


Example: Strategies to Absorb/Eliminate excess traffic

strategy Outgun

[cHiRespTime] {

t0 : (cHiRespTime) -> enlistServers(1)@[30000 /*ms*/] { // enlist server, wait 30s

t1: (sucess) -> done;

t2: (fail) -> lowerFidelity() @[2000 /*ms*/] {



}

}

}

Fully automated

Under what conditions will one strategy be better than the other?

29

strategy Eliminate

[ONLNB & (unhandledMalicious || unhandledSuspicious)] {

t0: (unHandledMalicious) -> : blackHoleAttacker()@[300000 /*ms*/] { // blackhole, wait 5 min


t0ab: (unhandledSuspicious) -> throttleSuspicious()@[30000 /*ms*/] {



}

}

}

Relies on human effectors


Analysis results: Scenario 1 favoring elimination of malicious clients

30

Outgun vs Eliminate accrued utility


Analysis results: Scenario 2 – favoring optimization of user experience

31

Outgun vs Eliminate accrued utility


Analysis results: strategy selection (Scenario 1 – eliminate malicious clients)

32

Eliminate predominates. Human involvement useful even if training is limited, or with low level of malicious clients (20%) if training is good.


Analysis results: strategy selection (Scenario 2 – optimize user experience)

33

Outgun predominates. Human involvement only useful if operator has extensive training (>0.55) and malicious clients >50%.


Open Questions

This is a first step illustrating how formal models of humans can augment approaches to adaptive systems to support principled incorporation of humans as actuators.

Many issues are still unresolved

How can this approach be extended to the other roles?

Would there be benefits from using richer human models?

Where does the information in the models come from and how is it updated dynamically?

What about other forms of security beyond DoS?

Other domains?

Can the system proactively affect a user’s willingness and capability to improve the collaboration?


A Missing Piece of the Puzzle

Models of humans can improve the collaborative nature of adaptive systems – as we have argued.

But this addresses only half the problem: what about a user’s understanding of the system and its adaptive behavior?

Such understanding is crucial

To improve “willingness” and “capability”

To allow users to detect and correct adaptation errors

To provide missing information to the system

To support trust in a systems’ autonomous behavior

Unfortunately today’s autonomic systems are largely opaque!


Improving Transparency

Key idea: use our formal models for planning as the basis of human-understandable explanation.*

Elements of planning models that can be used for explanation:

Explicit goal for system adaptation

Explicit representation of quality dimensions and utility

Ability to explore alternative tradeoffs

Traceability from utility measures to the quality dimensions and models that contribute to it.


Task Planning of Cyber-Human Systems. Sukkerd, Garlan, Simmons. The 13th International Conf. on Software Engineering & Formal Methods, Springer LNCS 9276 2015.

Example: Service Robots


Architecture

SpeedVisionNavigation…

Map

DistanceSafetyCharging stations…

Instruction Graph

MoveSet speedSet navigation…

Power

How much energy is required to do X?

Turtlebot


L1 L4

L2 L3 L5

L6

Full speedHigh-fidelity vision

unsafe unsafe

Low battery!

Time-Safety TradeoffBattery energy constraint

Half speedLow-fidelity vision

Slower, Less safe

The Structure of an Explanation

“What am I trying to achieve?”

Goal predicate, optimization objectives, constraints.

“What did I decide to do?”

Narration of the chosen plan.

“What are the expected results and consequences of my decision?”

Expected qualities and properties of the chosen plan (objective measures).

“What are some reasonable alternatives?”

Select from a set of Pareto efficient alternatives.

“Why did I reject the other alternatives?”

Value judgement and tradeoffs (subjective measures).


Example explanation

“I’m planning to go through Corridor A to get to the target. It would take 2 minutes and it would have 0.05-probability of collision. I could reduce time to 1 minute, but at the expense of probability of collision (increase probability of collision to 0.2), by going through Corridor B instead. However, I decided not to do that because the decrease in time is not worth the increase in probability of collision.”


Plan or policyGoal predicateQuality attributes of planAlternative plansTradeoffsJustification

A Generalized Tool for Explanation


Explanation Generator

Vocabulary, templates,

etc.

• Plan• Plan quality values• Alternative plans• Alternative plans’

quality values

“I’m planning to go through Corridor A to get to the target. It would take 2 minutes and it would have 0.05-probability of collision. I could reduce time to 1 minute, but at the expense of probability of collision (increase probability of collision to 0.2), by going through Corridor B instead. However, I decided not to do that because the decrease in time is not worth the increase in probability of collision.”

Technical Challenges

Explaining a plan that computed from a probabilistic system model is not easy.

How to describe a plan that maximizes expected utility in non-mathematical ways?

There are many Pareto efficient alternatives; which ones to pick?

Focus on dimensions where there may be disagreement?

Traceability to source models from which quality attributes are derived requires additional specification

This is typically lost in most planners.

It is not obvious when to explain.

Unusual situations? When the consequence of mistake is high? When user is unwilling?


Talk Outline








A Few Ideas to Consider

Building on the use of models and explanations, there are a number of other interesting ideas to explore.

Collaboration philosophy

Ultron versus Ironman

Division of responsibility

Thinking Fast and Slow

New technologies for increasing bandwidth

Brain-Computer Interaction


Philosophy of Collaboration

“Automation Should Be Like Iron Man, Not Ultron”CACM, Vol. 59 No. 3, pps. 58-61.

Ultron is the ultimate robot – it can do anything (until it can’t).

Ironman is a human amplifier – it turns Tony Stark into a superman.

The “left-over” principle leads to Ultron systems in which only the hard bits are left to a human.

Humans are less engaged, and less frequently needed, so that when it comes time to solve a problem they are not prepared.

Humans must stay in the loop: engaged and learning.


Division of Responsibility

Thinking, Fast and SlowKahneman, 2011

System 1: Reactive, fast, learned

System 2: Deliberative, slow, analytical

Can we exploit this as a natural division of responsibility between human and machine?

Human uses pattern recognition; machine synthesizes plans

Machine reacts quickly to new data; human decides what to do about it


Brain-Computer Interaction

“Improving Human-in-the-Loop Adaptive Systems Using Brain-Computer Interaction”Lloyd, Huang, Tognoli. SEAMS 2017.

Brain sensing technology is getting steadily cheaper and better.

Can we improve the bandwidth between humans and systems using this technology?

Preliminary results suggest the answer is yes.

Constructed a collaborative stock investment experiment.

Showed that the system could use brain sensors to determine when to trust a human’s investment decisions.

Together they performed better than either alone.


Summary

Human models can augment adaptive systems to improve collaboration.

Explainability is an important requirement for all adaptive systems.

There are many interesting challenges to be addressed in order to achieve true human-machine synergy.


?

Model-based coordination

References - 1

The Multiple-Asymmetric-Utility System Model: A Framework for Modeling Cyber-Human Systems. D. Eskins & W. H. Sanders. Proc. of the 8th Intl. Conf. on Quantitative Evaluation of SysTems (QEST 2011). Sept 2011, pp 233-242.

Rainbow: Architecture-Based Self Adaptation with Reusable Infrastructure. D. Garlan, et al. IEEE Computer, Vol. 37(10), October 2004.

Stitch: A Language for Architecture-Based Self-Adaptation. S.W. Cheng and D. Garlan. Journal of Systems and Software,, Vol. 85(12), December 2012.

Stochastic Game Analysis and Latency Awareness for Proactive Self-Adaptation. J. Cámara, G. A. Moreno & David Garlan. In 9th Intl Conf. on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), June 2014.


References - 2

Architecture-Based Self-Protection: Composing and Reasoning about Denial-of-Service Mitigations. Schmerl, et al. In HotSoS 2014: 2014 Symposium and Bootcamp on the Science of Security, April 2014.

Reasoning about Human Participation in Self-AdaptiveSystems. Javier Cámara, Gabriel A. Moreno, David Garlan. 9th Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), Florence, Italy, May 2015.

Evaluating Trade-Offs of Human Involvement in Self-Adaptive Systems. Javier Cámara, Bradley Schmerl, Gabriel A. Moreno, David Garlan. In Managing Trade-Offs in Adaptable Software Architectures. Elsevier.

Automation Should Be Like Iron Man, Not Ultron. CACM, October 31, 2015, Volume 13, issue 8.


References 3

Task Planning of Cyber-Human Systems. Roykrong Sukkerd, David Garlan and Reid Simmons. In Proceedings of the 13th International Conference on Software Engineering and Formal Methods, Vol. 9276 of LNCS , Springer 2015.

Improving Human-in-the-Loop Adaptive Systems Using Brain-Computer Interaction. Lloyd, Huang, and Tognoli. 11th Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS 17).


human-machine synergy: bringing humans and autonomy...

Documents