leveraging data provenance to enhance cyber...
TRANSCRIPT
Leveraging Data Provenance to Enhance
Cyber Resilience Thomas Moyer
Karishma Chadha, Robert Cunningham, Nabil Schear, Warren Smith, Adam
Bates, Kevin Butler, Frank Capobianco, Trent Jaeger, and Patrick Cable
IEEE SecDev 2016
4 Nov 2016
DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited.
This material is based upon work supported by the Assistant Secretary of
Defense for Research and Engineering under Air Force Contract No.
FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings,
conclusions or recommendations expressed in this material are those of
the author(s) and do not necessarily reflect the views of the Assistant
Secretary of Defense for Research and Engineering.
Div. 5 Provenance - 2
TMM 05/27/16
“… battles, campaigns, and even
wars have been won or lost
primarily because of logistics.” - General Dwight D. Eisenhower
Div. 5 Provenance - 3
TMM 05/27/16
Data Resilience for US Transportation Command
• USTRANSCOM challenges
– 70% of transportation work done by 3rd party contractors
– APTs have targeted and attacked USTRANSCOM directly and via contractors
• Goal: Ensure integrity of logistics planning operations as they transition to the cloud
– Monitor and inform users of data integrity during logistics planning
USTRANSCOM needs resilient systems to ensure DoD mission success
Div. 5 Provenance - 4
TMM 05/27/16
What is Resilience? Federal Cybersecurity Research and Development Strategic Plan
12
Figure 1. Continuously strengthening defensive elements improves success in thwarting malicious cyber activities.
Critical Dependencies
Advancements in the following six areas are critical to developing the S&T for the four elements:
Scientific foundation. The Federal Government should support research that establishes the theoretical, empirical, computational, and data mining foundation needed to address future threats. A strong, rigorous scientific foundation for cybersecurity identifies methods of measurement, testable models, and formal frameworks, as well as forecasting techniques that express the essential security dynamics of cyber systems and processes. Such foundational understanding is the primary basis for developing effective defensive cyber technologies and practices.
Risk management. Cybersecurity decisions in an organization should be based on a shared assessment of the organization’s assets, vulnerabilities, and potential threats, so that security investments can be risk-informed. This must be achieved despite the incomplete knowledge the organization has of its assets, vulnerabilities, exposures, and potential threats. An effective risk management approach requires an ability to assess the likelihood of malicious cyber activity and its possible consequences, and correctly quantify costs resulting from successful exploitation and risk mitigation. Timely, risk-relevant threat intelligence information sharing can improve organizations’ abilities to assess and manage risks.
Human aspects. Researchers are capable of developing innovative technical solutions for protecting cyber systems, but those solutions will fail if they do not recognize how users, defenders, adversaries, and institutions interact with technology. Beyond helping to address the challenges of human-system interactions, collaborative engagement of social scientists in cybersecurity research can increase understanding of the social, behavioral, and economic aspects of cybersecurity and how to improve collective risk governance.
Transition to practice. A well-articulated, coordinated process that transitions the fruits of research into practice is essential to ensure high-impact Federal cybersecurity R&D. The research community, which focuses on developing and demonstrating novel and innovative technologies, and the operational community, which needs to integrate solutions into existing industry products and services, are not always aligned. An effective technology transfer program must be an integral part of any R&D strategy and rely on sustained and significant public-private participation.
“Federal Cybersecurity Research and Development Strategic Plan”, National Science and Technology Council, February 2016
Data provenance provides detection and adaptation capabilities for a resilient system
Malicious
Cyber Activities
Div. 5 Provenance - 5
TMM 05/27/16
• National Need: Resilient Systems
• Data Provenance
• Use Case: Data Integrity for USTRANSCOM
• Future Work and Summary
Outline
Div. 5 Provenance - 6
TMM 05/27/16
Data Provenance Enables Resilience
• Data provenance is the history of ownership/processing to guide authenticity
• Data provenance helps to answer:
Activity used wasGeneratedBy
wasAssociatedWith
wasDerivedFrom
wasAttributedTo wasAttributedTo
Entity Entity
Agent
Processes Data
Users, groups, other
systems, etc…
– Where are all my data?
– Where did they come from?
– Are the data secure and trustworthy?
– How to recover after being attacked?
World Wide Web
Consortium
Div. 5 Provenance - 7
TMM 05/27/16
• Granularity
– How much detail to collect?
• Collection
– Where to collect provenance data?
• Encoding
– Do current standards allow the system to fully express the semantics of the data?
• Storage
– How is the provenance data protected against malicious modifications?
• Analysis
– What can the collected data tell system users?
• Adaptation
– What actions are possible, based on the analysis of the provenance data?
Secure Data Provenance Challenges
Granularity Collection Encoding Storage Analysis Adaptation
Answering these questions incorrectly leads to a
provenance system that will not achieve the desired goals
Div. 5 Provenance - 8
TMM 05/27/16
Lincoln Secure Data Provenance Technology
End-to-end integrated provenance system enabling mission system resilience
Mission System
Operating Systems
Applications
Software
Infrastructure
Context
Coverage
Developer
Annotation
Library
Database
Provenance
Collector
Linux
Provenance
Modules
Granularity Collection Encoding Storage Analysis Adaptation
Secure
Graph
Database
• Ancestors
• Descendants
• Anomaly Detection
Active Response
World Wide Web
Consortium
Activity
Agent
Entity Entity
Operator Interfaces
Div. 5 Provenance - 9
TMM 05/27/16
• National Need: Resilient Systems
• Data Provenance
• Use Case: Data Integrity for USTRANSCOM
• Future Work and Summary
Outline
Div. 5 Provenance - 10
TMM 05/27/16
Operational Architecture OV-1: Distribution High Level View
Plan Order Ship Pay
Protecting the integrity of the data and processing in the planning pipeline is
critical to ensuring mission success for USTRANSCOM
Div. 5 Provenance - 11
TMM 05/27/16
Provenance for the Planning Process
• Plans produced by the correct processing pipeline are required for mission success
– Requirements are generated from a request
– A plan is generated from requirements
– Ultimately, a plan is derived from a request
wasDerivedFrom
Requirements Plan Request wasGeneratedBy used used wasGeneratedBy
Analyst
Requirements
Collection
wasAssociatedWith
Planner
Planning
Service
wasAssociatedWith
✔
Div. 5 Provenance - 12
TMM 05/27/16
Anomalies in the Planning Process
• If data used to create a plan is unexpectedly modified, the mission is at risk
– Requirements generated from a request, and modified by malicious software, or attacker
– A plan is generated from modified requirements
– Plan no longer derived from the expected requirements
wasDerivedFrom
Requirements Plan Request wasGeneratedBy used used wasGeneratedBy
Analyst
Requirements
Collection
wasAssociatedWith
Planner
Planning
Service
wasAssociatedWith Modified
Reqs.
✗
Div. 5 Provenance - 13
TMM 05/27/16
Lincoln Secure Data Provenance Technology
End-to-end integrated provenance system enabling mission system resilience
Mission System
Operating Systems
Applications
Software
Infrastructure
Context
Coverage
Developer
Annotation
Library
Database
Provenance
Collector
Linux
Provenance
Modules
Granularity Collection Encoding Storage Analysis Adaptation
Secure
Graph
Database
• Ancestors
• Descendants
• Anomaly Detection
Active Response
World Wide Web
Consortium
Activity
Agent
Entity Entity
Operator Interfaces
Div. 5 Provenance - 14
TMM 05/27/16
• Added instrumentation: 300 lines of code for 311K lines of code in system
Example Code Annotation
Attributes.add( ”RequirementsId”, requirementsId ); Attributes.add( ”TransactionId”, transactionId ); Requirements = newEntity( ”Requirements” , Attributes ); StoreProvenance( transactionId, Requirements );
Create provenance
graph entries
Store provenance
Data provenance collection requires very little instrumentation
to provide high impact on system resilience
Div. 5 Provenance - 15
TMM 05/27/16
0
2
4
6
8
10
12
Data
Data
Sto
rag
e in
MB
Storage Overhead
Planning Data Provenance
Overhead
0
10
20
30
40
50
60
70
80
Req. Collect Req. Store Gen. Plan
Sec
on
ds
Operation
Collection Overhead
Execution Time Prov. Collection Time
Collection and storage overheads are a tiny fraction of the
overall computation and storage costs for the system.
4%
Overhead
<1%
Overhead
Div. 5 Provenance - 16
TMM 05/27/16
USTRANSCOM Provenance Integration Provenance Analysis Integrated in Operator User Interface
Good plan, using
requirements from
known sources
Insecure plan using
requirements from
an unknown source
Provenance-based data resilience is integrated into mission operators’ workflow
Div. 5 Provenance - 17
TMM 05/27/16
End-to-end integrated provenance system enabling mission system resilience
Mission System
Operating Systems
Applications
Software
Infrastructure
Context
Coverage
Developer
Annotation
Library
Database
Collector
Linux
Provenance
Modules
Granularity Collection Standard
Encoding
Secure
Storage Detect Adapt
Graph
Database
• Graph Analysis
• Ancestors
• Descendants
• Anomaly Detection
Operator SA Displays
Active Response
Built Secure and Efficient OS-level Provenance Collector
0
2
4
6
8
10
Compilation Mail Server DNA Search
Pro
ven
an
ce C
oll
ecto
r O
ve
rhe
ad
(%
)
System Workload
Lincoln Secure Data Provenance Technology
Div. 5 Provenance - 18
TMM 05/27/16
End-to-end integrated provenance system enabling mission system resilience
Mission System
Operating Systems
Applications
Software
Infrastructure
Context
Coverage
Developer
Annotation
Library
Database
Collector
Linux
Provenance
Modules
Granularity Collection Standard
Encoding
Secure
Storage Detect Adapt
Graph
Database
• Graph Analysis
• Ancestors
• Descendants
• Anomaly Detection
Operator SA Displays
Active Response
Demonstrated Efficient Queries on Large Provenance Graphs
0.95
0.96
0.97
0.98
0.99
1
0 5 10 15 20 25
Cum
ula
tive D
ensity
Response Time (Milliseconds)
Ancestry queries on 6.5M node provenance graph
using SNAP in-memory graph database
99% of queries
return in less
than 2.5ms
Lincoln Secure Data Provenance Technology
Div. 5 Provenance - 19
TMM 05/27/16
End-to-end integrated provenance system enabling mission system resilience
Mission System
Operating Systems
Applications
Software
Infrastructure
Context
Coverage
Developer
Annotation
Library
Database
Collector
Linux
Provenance
Modules
Granularity Collection Standard
Encoding
Secure
Storage Detect Adapt
Graph
Database
• Graph Analysis
• Ancestors
• Descendants
• Anomaly Detection
Operator SA Displays
Active Response
Using Data Provenance to Prevent Data Exfiltration
6.1ms latency to prevent SQL command injection attacks
Database Web
Server Guard
Provenance
Collector
?
Lincoln Secure Data Provenance Technology
W3C – World Wide Web Consortium
SA – Situational Awareness
SQL – Structured Query Language
Div. 5 Provenance - 20
TMM 05/27/16
• National Need: Resilient Systems
• Data Provenance
• Use Case: Data Integrity for USTRANSCOM
• Future Work and Summary
Outline
Div. 5 Provenance - 21
TMM 05/27/16
Future Work
Current analytics only provide
awareness of problems
Active response mechanisms to
recover from anomalies
Challenge Solution
Legacy code requires manual
provenance instrumentation Code analysis to automatically retrofit legacy code
Current analysis is tailored to
workflow for a specific system
Enhanced analytics that detect
deviations in an automated way
Systems and analytics rely on a
single provenance sensor
Integration of multiple provenance sensors for a
holistic view of data processing
Div. 5 Provenance - 22
TMM 05/27/16
• Active work to transfer this technology to other mission areas within the lab
– ISR, Space, Ballistic Missile Defense
Summary
Provenance is capable of providing resilience for data processing to ensure mission success
Mission System
Operating Systems
Applications
Software Infrastructure
Context
Coverage
Developer
Annotation
Library
Database
Provenance
Collector
Linux
Provenance
Modules
Granularity Collection Encoding Storage Analysis Adaptation
Secure Graph
Database
• Ancestors
• Descendants
• Anomaly Detection
Active Response
World Wide Web
Consortium
Activity
Agent
Entity Entity
Operator Interfaces
Div. 5 Provenance - 23
TMM 05/27/16
Getting Linux Provenance Modules
Linux Provenance Modules is available for download from http://linuxprovenance.org
Div. 5 Provenance - 24
TMM 05/27/16
Acknowledgements
Informatics and Decision Support Group
Information Integration and Decision Support Group
University Collaborations
Bryan Richard
Robert Rudd
Nabil Schear
Warren Smith
Patrick Cable
Karishma Chadha
Rob Cunningham
Jeff Diewald
Ben Kaiser
Adam Bates
Kevin Butler
Trent Jaeger
Frank Capobianco
Michael Calder
Christopher Botaish
George Heineman
Div. 5 Provenance - 25
TMM 05/27/16
Legal Notices
© 2016 Massachusetts Institute of Technology.
Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014).
Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-
7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights
that exist in this work.