demystifying the black box: a test strategy for autonomy · demystifying the black box: a test...
TRANSCRIPT
![Page 1: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/1.jpg)
Demystifying the Black Box: A Test Strategy for Autonomy
Dr. Daniel PorterInstitute for Defense Analyses
Operational Evaluation [email protected]
12 April 2019
![Page 2: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/2.jpg)
1
![Page 3: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/3.jpg)
Talk Takeaways – Order of Importance
2
1. Testing should aim to develop a model of system decision-making and confirm the underlying capabilities.
2. The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations.
![Page 4: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/4.jpg)
Talk Takeaways
3
Testing should aim to develop a model of system decision-making and confirm the underlying capabilities.
The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations.
![Page 5: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/5.jpg)
B. Testing should aim to develop a model of system decision-making and confirm the underlying capabilities.
A. The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations.
Talk Takeaways – Order of Discussion
4
![Page 6: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/6.jpg)
The fundamental challenge of testing autonomy and AI is generalizing to
unobserved situations.
5
![Page 7: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/7.jpg)
An autonomous car shows up instead of a taxi
6
![Page 8: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/8.jpg)
If the AI certification process were just the same road test that humans take, would you trust it?
7
![Page 9: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/9.jpg)
We trust the human because we have a model
8
• Neither has encountered all the situations it will
• We trust the human but not the car
• I have a model of the human’s decision-making Not being dead is proof the model works
![Page 10: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/10.jpg)
What do I mean by model?
9
He tried to put the poison where he thinks I would take it.
He’s thinking about where I think he’s thinking about putting it!
![Page 11: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/11.jpg)
“Common Sense” is the collection of models we implicitly understand humans use to act in the world.
If we want humans to appropriately trust machine decisions, the humans must be able to model those decisions.
10
![Page 12: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/12.jpg)
Trust of decision making has three basic inputs
11
Goals
Competence
Process
![Page 13: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/13.jpg)
Goals may exist globally
12
• Don’t die• Don’t get arrested• Get paid
Competence
Process
![Page 14: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/14.jpg)
• Don’t die• Don’t get arrested• Get paid
Global goals can spawn task-specific sub-goals
13
Get there safely
Competence
Process
![Page 15: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/15.jpg)
Get there safely
Process identifies current state and picks next action
14
Competence
Process
![Page 16: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/16.jpg)
Human Competence is well-tested
15
Manufacturing Line
Quality Assurance
![Page 17: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/17.jpg)
Human driver certification can be weak because I can assume life tested most of the underlying Competence
16
![Page 18: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/18.jpg)
Machines don’t have common sense
17
• If decision engine is a black box, we don’t understand: Global GoalsMoment-to-moment decision Process Underlying Competence those processes depend on
• People fear discontinuities in decision-makingMachine Learning is just signal extraction
The world is full of correlated but incorrect signal
If we don’t have a model, we can’t be confident behavior will be continuous as we move across a dimension
![Page 19: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/19.jpg)
Signal may not be universally useful
18
![Page 20: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/20.jpg)
The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations
19
• Aerodynamics model allows inference Test edges, infer center
![Page 21: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/21.jpg)
# Non-target objectsCollateral Damage
The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations
20
• We don’t have models of system decision-making
Valu
e of
Targ
et
???High Probability Engage
Low Probability Engage
![Page 22: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/22.jpg)
The fundamental challenge of testing autonomy and AI is generalizing to
unobserved situations.
21
![Page 23: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/23.jpg)
Testing should aim to develop a model of system decision-making and confirm
the underlying capabilities.
22
![Page 24: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/24.jpg)
Testing is about assurance
23
• Does the system meet its requirements? Contractual Operational
• To what extent do different factors affect performance? Identify areas for improvement
Inform development of tactics/guidance
Hopefully aligned
![Page 25: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/25.jpg)
Assurance can be built in different ways
24
• Brute ForceCover operational space sufficiently for acceptable level of riskBlack box forces this approach
• InterpolationObserve limited points and predict between observationsHaving an underlying model enables this approach
![Page 26: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/26.jpg)
1. The system’s Goals & Processes are reasonable. What is the system trying to accomplish? What information is the system using, and how does that
information change its decision?
2. The Competencies these require are functioning. Is it able to acquire that information and execute its actions?
25
Model-based assurance needs to find two things
![Page 27: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/27.jpg)
Autonomy: Making decisions based on environmental input
26
VS
![Page 28: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/28.jpg)
Autonomy exists at the level of task being considered
27
Mission:Clear Minefield
Task:Remove mine
Task:Locate Mine
Task:Pick Search Pattern
![Page 29: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/29.jpg)
Some systems won’t need testing to find Goals or Process.
28
![Page 30: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/30.jpg)
Some systems’ operationally relevant goals are decided by a human
29
• Procedural Autonomy System has autonomy in the moment-to-moment Process
decisions to achieve a GoalE.g., control loops
![Page 31: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/31.jpg)
Systems with just Procedural Autonomy don’t need special test methods in most cases
30
• The Process is known in advance Physics model or explicitly coded logic
• Brute force is feasible Small operational space or low-risk consequences
• Correct moment-to-moment decisions just affect performance of a defined task If it is performing well, it is making the right decisions
![Page 32: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/32.jpg)
We need models for systems that make Goal decisions
31
• Executive Autonomy System can set a goal for itselfMaking “should” decisions about tasks
• “Should” decision correctness usually won’t be captured by typical objective performance metrics
• This is the type of autonomy that really worries people
![Page 33: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/33.jpg)
If you need a model… A Cheat Sheet
32
1. Decompose the mission into operationally relevant tasks.
2. Determine for which tasks the system has autonomy.
3. Write out information needs of the tasks for which the system has autonomy.
4. Build model by experimenting with system decision-making across those information dimensions.
5. Confirm system’s Competence to get accurate information and execute appropriate actions.
6. Test decision performance in realistic scenarios.
![Page 34: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/34.jpg)
Testing needs to become a continuum
33
• Build a model Demonstrate that information affects decisions correctly
Traditional contractor testingGood candidate for M&S
• Demonstrate Competencies Show that system can accurately acquire this information in a
timely manner under realistic conditionsTraditional developmental testing
• Test the model Show that the system makes appropriate decisions under
realistic conditionsTraditional operational testing
![Page 35: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/35.jpg)
Testing should provide assurance about the model
34
Goals
Competence
Process
![Page 36: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/36.jpg)
Talk Takeaways – Order of Importance
35
1. Testing should aim to develop a model of system decision-making and confirm the underlying capabilities.
2. The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations.
![Page 38: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/38.jpg)
Thank You
37
![Page 39: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/39.jpg)
Backup
38
![Page 40: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/40.jpg)
In layman’s terms…
39
• Goals What does the world look like when I’m done? How do I value achieving a certain situation?
• Process Strategy for:
Identifying features of current situation Identifying available optionsEvaluating how options would change the situation Choosing an option that best meets goalsExecuting chosen action
• Competence How well can you perform each Process step
![Page 41: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/41.jpg)
A test of AI/autonomy should let us know…
40
• Goals What does the world look like when I’m done? How do I value achieving a certain situation?
• Process Strategy for:
Identifying features of current situation Identifying available optionsEvaluating how options would change the situation Choosing an option that best meets goalsExecuting chosen action
• Capabilities How well can you perform each Process step
These are reasonable
These are adequate
![Page 42: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/42.jpg)
The lifecycle of test must be a continuum, not discrete categories operating as independent fiefdoms.
41
![Page 43: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/43.jpg)
Testing autonomy will require more data
42
• Autonomous systems will have larger operational spaces Still have to test physical performance Also have to test decision performance
Adds (many) factors to test design
• People likely less forgiving of machine decisions Acceptable level of risk will be smaller
Requires more evidence to achieve acceptable risk
• Need efficient methods to discover AI’s model
![Page 44: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/44.jpg)
Designing a model is easier than figuring it out.
43
![Page 45: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/45.jpg)
The Black (Box) Plague should be avoided
44
AUTONOMY
![Page 46: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/46.jpg)
Cognitive Architecture: Life’s easier if you plan!
45
VIDEO FEED
SHAPEBUILDER
EDGE DETECTOR
OBJECTASSEMBLER
SPATIALLOCALIZER
OBJECTIDENTIFER
Task: Identify what things are where
![Page 47: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/47.jpg)
Design architectures from the start.
46
![Page 48: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/48.jpg)
Data collection must be built into the system
47
• The system must record the data itself Impossible to record data in many situations Requires horde of observers when it is possible
• Data collection infrastructure must be a requirement
• This is not just for OT Developers & DT will need the infrastructure too
Need to diagnose decisions to fix them
![Page 49: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/49.jpg)
We must get more evidence without breaking budgets
48
• Challenge: Autonomy will require more evidence
• Solution: Build a “body of evidence” over time Targeted testing: cover the space in intelligent ways
Each point must provide more evidential valueFocus on what test points allow us to learn about system
Expand data sources that inform operational evaluationMore evidence without more test
![Page 50: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/50.jpg)
Targeted testing must be informed by prior results
49
• Sequential testing guides targeted testing Pick next test points based on what we learned in past Test over time instead of one massive test Helps maximize value of each point
• Modeling & Simulation can inform targeted testing
![Page 51: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/51.jpg)
Targeted testing must not delay fielding
50
• Challenge: Sequential testing can expand timelines Need to have previous test points to pick the next ones
Can’t do this in a live test, so have to test over longer period
• Solution: Push the start of testing left Begin collecting operational-esque data earlier
Earlier start means data must support both DT & OTDT/OT needs to become a continuumThis is probably desirable for autonomy in any event
AI needs realistic environment to see true behavior anyway OT needs to continue to enhance our understanding of system
![Page 52: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/52.jpg)
CT-DT-OT should be CDOT continuum
51
Develop a concept of what
the model is
Test performance under realistic
conditions
Confirm model, assess underlying
capabilities
![Page 53: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/53.jpg)
Spiral development should probably be the goal
52
• Challenge: Testing systems with large operational spaces where failure risks human life
Too many opportunities for holes in coverage to lead to catastrophic consequences
• Solution: Limited or Incremental Capability Fielding Complex tasks can be broken down into smaller ones Choose a subtask with acceptable risk and test that
If it passes this test, approve it for fielding on that task Potentially limit to human supervision
Collect field data through built-in infrastructureOver time adjust risk of approved tasks
![Page 54: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/54.jpg)
Risky, complex systems should not be fielded all at once
53
Perform OT on SC
Approve SC for human -supervised
fielding
Collect extensive field data
Choose a sub -capability
Collect extensive field data
Increase risk of approved
unsupervised tasks
Increase risk of approved human -supervised tasks
![Page 55: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/55.jpg)
54
![Page 56: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/56.jpg)
Talk Takeaways – Order of Importance
55
1. Testing should aim to develop a model of system decision-making and confirm the underlying capabilities.
2. The lifecycle of test must be a continuum, not discrete categories operating as independent fiefdoms.
3. The fundamental challenge of testing autonomy and AI is generalizing to unobserved situations.
![Page 57: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/57.jpg)
Acknowledgements
56
• DARPA Explainable AI (XAI)Machine Common Sense (MCS)
• Acquisition Reform Shift Left Integrated Testing Incremental Capability Fielding / Spiral Development
![Page 58: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/58.jpg)
Test should still be scoped around acceptable risk
57
• What are the possible consequences? Size of operational space
How likely are we to miss a problem? Severity of consequence
What harm could a failure cause?
• How independent is the system? Time between human control
Is it off on its own for long periods of time? Decisions between human control
Does it act faster than a human can reasonable intervene?
![Page 59: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/59.jpg)
All vowels have an odd number on the back.What cards do you need to flip to test this fully?
58
A 3 B 8
![Page 60: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/60.jpg)
There are four people in a bar. For some you know what they’re drinking. For some you know their age. Who do you check to make sure the law isn’t being broken?
59
Beer 22 Soda 15
![Page 61: Demystifying the Black Box: A Test Strategy for Autonomy · Demystifying the Black Box: A Test Strategy for Autonomy Dr. Daniel Porter Institute for Defense Analyses. Operational](https://reader036.vdocuments.us/reader036/viewer/2022081600/605c17e45d87771c8b6b2819/html5/thumbnails/61.jpg)
The goal of testing should be developing and confirming a generalizable model of system decision-making.
61
• Brute force testing is not feasible
• Interpolation required for evaluation and TTPs
• Models enable interpolation
• Solution: structure tests and pick test points based on what allows you to understand the decision process