leveraging data provenance to enhance cyber...

25
Leveraging Data Provenance to Enhance Cyber Resilience Thomas Moyer Karishma Chadha, Robert Cunningham, Nabil Schear, Warren Smith, Adam Bates, Kevin Butler, Frank Capobianco, Trent Jaeger, and Patrick Cable IEEE SecDev 2016 4 Nov 2016 DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Assistant Secretary of Defense for Research and Engineering.

Upload: others

Post on 28-Oct-2019

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Leveraging Data Provenance to Enhance

Cyber Resilience Thomas Moyer

Karishma Chadha, Robert Cunningham, Nabil Schear, Warren Smith, Adam

Bates, Kevin Butler, Frank Capobianco, Trent Jaeger, and Patrick Cable

IEEE SecDev 2016

4 Nov 2016

DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited.

This material is based upon work supported by the Assistant Secretary of

Defense for Research and Engineering under Air Force Contract No.

FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings,

conclusions or recommendations expressed in this material are those of

the author(s) and do not necessarily reflect the views of the Assistant

Secretary of Defense for Research and Engineering.

Page 2: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 2

TMM 05/27/16

“… battles, campaigns, and even

wars have been won or lost

primarily because of logistics.” - General Dwight D. Eisenhower

Page 3: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 3

TMM 05/27/16

Data Resilience for US Transportation Command

• USTRANSCOM challenges

– 70% of transportation work done by 3rd party contractors

– APTs have targeted and attacked USTRANSCOM directly and via contractors

• Goal: Ensure integrity of logistics planning operations as they transition to the cloud

– Monitor and inform users of data integrity during logistics planning

USTRANSCOM needs resilient systems to ensure DoD mission success

Page 4: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 4

TMM 05/27/16

What is Resilience? Federal Cybersecurity Research and Development Strategic Plan

12

Figure 1. Continuously strengthening defensive elements improves success in thwarting malicious cyber activities.

Critical Dependencies

Advancements in the following six areas are critical to developing the S&T for the four elements:

Scientific foundation. The Federal Government should support research that establishes the theoretical, empirical, computational, and data mining foundation needed to address future threats. A strong, rigorous scientific foundation for cybersecurity identifies methods of measurement, testable models, and formal frameworks, as well as forecasting techniques that express the essential security dynamics of cyber systems and processes. Such foundational understanding is the primary basis for developing effective defensive cyber technologies and practices.

Risk management. Cybersecurity decisions in an organization should be based on a shared assessment of the organization’s assets, vulnerabilities, and potential threats, so that security investments can be risk-informed. This must be achieved despite the incomplete knowledge the organization has of its assets, vulnerabilities, exposures, and potential threats. An effective risk management approach requires an ability to assess the likelihood of malicious cyber activity and its possible consequences, and correctly quantify costs resulting from successful exploitation and risk mitigation. Timely, risk-relevant threat intelligence information sharing can improve organizations’ abilities to assess and manage risks.

Human aspects. Researchers are capable of developing innovative technical solutions for protecting cyber systems, but those solutions will fail if they do not recognize how users, defenders, adversaries, and institutions interact with technology. Beyond helping to address the challenges of human-system interactions, collaborative engagement of social scientists in cybersecurity research can increase understanding of the social, behavioral, and economic aspects of cybersecurity and how to improve collective risk governance.

Transition to practice. A well-articulated, coordinated process that transitions the fruits of research into practice is essential to ensure high-impact Federal cybersecurity R&D. The research community, which focuses on developing and demonstrating novel and innovative technologies, and the operational community, which needs to integrate solutions into existing industry products and services, are not always aligned. An effective technology transfer program must be an integral part of any R&D strategy and rely on sustained and significant public-private participation.

“Federal Cybersecurity Research and Development Strategic Plan”, National Science and Technology Council, February 2016

Data provenance provides detection and adaptation capabilities for a resilient system

Malicious

Cyber Activities

Page 5: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 5

TMM 05/27/16

• National Need: Resilient Systems

• Data Provenance

• Use Case: Data Integrity for USTRANSCOM

• Future Work and Summary

Outline

Page 6: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 6

TMM 05/27/16

Data Provenance Enables Resilience

• Data provenance is the history of ownership/processing to guide authenticity

• Data provenance helps to answer:

Activity used wasGeneratedBy

wasAssociatedWith

wasDerivedFrom

wasAttributedTo wasAttributedTo

Entity Entity

Agent

Processes Data

Users, groups, other

systems, etc…

– Where are all my data?

– Where did they come from?

– Are the data secure and trustworthy?

– How to recover after being attacked?

World Wide Web

Consortium

Page 7: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 7

TMM 05/27/16

• Granularity

– How much detail to collect?

• Collection

– Where to collect provenance data?

• Encoding

– Do current standards allow the system to fully express the semantics of the data?

• Storage

– How is the provenance data protected against malicious modifications?

• Analysis

– What can the collected data tell system users?

• Adaptation

– What actions are possible, based on the analysis of the provenance data?

Secure Data Provenance Challenges

Granularity Collection Encoding Storage Analysis Adaptation

Answering these questions incorrectly leads to a

provenance system that will not achieve the desired goals

Page 8: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 8

TMM 05/27/16

Lincoln Secure Data Provenance Technology

End-to-end integrated provenance system enabling mission system resilience

Mission System

Operating Systems

Applications

Software

Infrastructure

Context

Coverage

Developer

Annotation

Library

Database

Provenance

Collector

Linux

Provenance

Modules

Granularity Collection Encoding Storage Analysis Adaptation

Secure

Graph

Database

• Ancestors

• Descendants

• Anomaly Detection

Active Response

World Wide Web

Consortium

Activity

Agent

Entity Entity

Operator Interfaces

Page 9: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 9

TMM 05/27/16

• National Need: Resilient Systems

• Data Provenance

• Use Case: Data Integrity for USTRANSCOM

• Future Work and Summary

Outline

Page 10: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 10

TMM 05/27/16

Operational Architecture OV-1: Distribution High Level View

Plan Order Ship Pay

Protecting the integrity of the data and processing in the planning pipeline is

critical to ensuring mission success for USTRANSCOM

Page 11: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 11

TMM 05/27/16

Provenance for the Planning Process

• Plans produced by the correct processing pipeline are required for mission success

– Requirements are generated from a request

– A plan is generated from requirements

– Ultimately, a plan is derived from a request

wasDerivedFrom

Requirements Plan Request wasGeneratedBy used used wasGeneratedBy

Analyst

Requirements

Collection

wasAssociatedWith

Planner

Planning

Service

wasAssociatedWith

Page 12: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 12

TMM 05/27/16

Anomalies in the Planning Process

• If data used to create a plan is unexpectedly modified, the mission is at risk

– Requirements generated from a request, and modified by malicious software, or attacker

– A plan is generated from modified requirements

– Plan no longer derived from the expected requirements

wasDerivedFrom

Requirements Plan Request wasGeneratedBy used used wasGeneratedBy

Analyst

Requirements

Collection

wasAssociatedWith

Planner

Planning

Service

wasAssociatedWith Modified

Reqs.

Page 13: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 13

TMM 05/27/16

Lincoln Secure Data Provenance Technology

End-to-end integrated provenance system enabling mission system resilience

Mission System

Operating Systems

Applications

Software

Infrastructure

Context

Coverage

Developer

Annotation

Library

Database

Provenance

Collector

Linux

Provenance

Modules

Granularity Collection Encoding Storage Analysis Adaptation

Secure

Graph

Database

• Ancestors

• Descendants

• Anomaly Detection

Active Response

World Wide Web

Consortium

Activity

Agent

Entity Entity

Operator Interfaces

Page 14: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 14

TMM 05/27/16

• Added instrumentation: 300 lines of code for 311K lines of code in system

Example Code Annotation

Attributes.add( ”RequirementsId”, requirementsId ); Attributes.add( ”TransactionId”, transactionId ); Requirements = newEntity( ”Requirements” , Attributes ); StoreProvenance( transactionId, Requirements );

Create provenance

graph entries

Store provenance

Data provenance collection requires very little instrumentation

to provide high impact on system resilience

Page 15: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 15

TMM 05/27/16

0

2

4

6

8

10

12

Data

Data

Sto

rag

e in

MB

Storage Overhead

Planning Data Provenance

Overhead

0

10

20

30

40

50

60

70

80

Req. Collect Req. Store Gen. Plan

Sec

on

ds

Operation

Collection Overhead

Execution Time Prov. Collection Time

Collection and storage overheads are a tiny fraction of the

overall computation and storage costs for the system.

4%

Overhead

<1%

Overhead

Page 16: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 16

TMM 05/27/16

USTRANSCOM Provenance Integration Provenance Analysis Integrated in Operator User Interface

Good plan, using

requirements from

known sources

Insecure plan using

requirements from

an unknown source

Provenance-based data resilience is integrated into mission operators’ workflow

Page 17: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 17

TMM 05/27/16

End-to-end integrated provenance system enabling mission system resilience

Mission System

Operating Systems

Applications

Software

Infrastructure

Context

Coverage

Developer

Annotation

Library

Database

Collector

Linux

Provenance

Modules

Granularity Collection Standard

Encoding

Secure

Storage Detect Adapt

Graph

Database

• Graph Analysis

• Ancestors

• Descendants

• Anomaly Detection

Operator SA Displays

Active Response

Built Secure and Efficient OS-level Provenance Collector

0

2

4

6

8

10

Compilation Mail Server DNA Search

Pro

ven

an

ce C

oll

ecto

r O

ve

rhe

ad

(%

)

System Workload

Lincoln Secure Data Provenance Technology

Page 18: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 18

TMM 05/27/16

End-to-end integrated provenance system enabling mission system resilience

Mission System

Operating Systems

Applications

Software

Infrastructure

Context

Coverage

Developer

Annotation

Library

Database

Collector

Linux

Provenance

Modules

Granularity Collection Standard

Encoding

Secure

Storage Detect Adapt

Graph

Database

• Graph Analysis

• Ancestors

• Descendants

• Anomaly Detection

Operator SA Displays

Active Response

Demonstrated Efficient Queries on Large Provenance Graphs

0.95

0.96

0.97

0.98

0.99

1

0 5 10 15 20 25

Cum

ula

tive D

ensity

Response Time (Milliseconds)

Ancestry queries on 6.5M node provenance graph

using SNAP in-memory graph database

99% of queries

return in less

than 2.5ms

Lincoln Secure Data Provenance Technology

Page 19: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 19

TMM 05/27/16

End-to-end integrated provenance system enabling mission system resilience

Mission System

Operating Systems

Applications

Software

Infrastructure

Context

Coverage

Developer

Annotation

Library

Database

Collector

Linux

Provenance

Modules

Granularity Collection Standard

Encoding

Secure

Storage Detect Adapt

Graph

Database

• Graph Analysis

• Ancestors

• Descendants

• Anomaly Detection

Operator SA Displays

Active Response

Using Data Provenance to Prevent Data Exfiltration

6.1ms latency to prevent SQL command injection attacks

Database Web

Server Guard

Provenance

Collector

?

Lincoln Secure Data Provenance Technology

W3C – World Wide Web Consortium

SA – Situational Awareness

SQL – Structured Query Language

Page 20: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 20

TMM 05/27/16

• National Need: Resilient Systems

• Data Provenance

• Use Case: Data Integrity for USTRANSCOM

• Future Work and Summary

Outline

Page 21: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 21

TMM 05/27/16

Future Work

Current analytics only provide

awareness of problems

Active response mechanisms to

recover from anomalies

Challenge Solution

Legacy code requires manual

provenance instrumentation Code analysis to automatically retrofit legacy code

Current analysis is tailored to

workflow for a specific system

Enhanced analytics that detect

deviations in an automated way

Systems and analytics rely on a

single provenance sensor

Integration of multiple provenance sensors for a

holistic view of data processing

Page 22: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 22

TMM 05/27/16

• Active work to transfer this technology to other mission areas within the lab

– ISR, Space, Ballistic Missile Defense

Summary

Provenance is capable of providing resilience for data processing to ensure mission success

Mission System

Operating Systems

Applications

Software Infrastructure

Context

Coverage

Developer

Annotation

Library

Database

Provenance

Collector

Linux

Provenance

Modules

Granularity Collection Encoding Storage Analysis Adaptation

Secure Graph

Database

• Ancestors

• Descendants

• Anomaly Detection

Active Response

World Wide Web

Consortium

Activity

Agent

Entity Entity

Operator Interfaces

Page 23: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 23

TMM 05/27/16

Getting Linux Provenance Modules

Linux Provenance Modules is available for download from http://linuxprovenance.org

Page 24: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 24

TMM 05/27/16

Acknowledgements

Informatics and Decision Support Group

Information Integration and Decision Support Group

University Collaborations

Bryan Richard

Robert Rudd

Nabil Schear

Warren Smith

Patrick Cable

Karishma Chadha

Rob Cunningham

Jeff Diewald

Ben Kaiser

Adam Bates

Kevin Butler

Trent Jaeger

Frank Capobianco

Michael Calder

Christopher Botaish

George Heineman

Page 25: Leveraging Data Provenance to Enhance Cyber Resiliencecybersec-prod.s3.amazonaws.com/secdev/wp-content/uploads/2016/12/... · Leveraging Data Provenance to Enhance Cyber Resilience

Div. 5 Provenance - 25

TMM 05/27/16

Legal Notices

© 2016 Massachusetts Institute of Technology.

Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014).

Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-

7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights

that exist in this work.