cloud-based technologies for improving cyber-security in

40
Cloud-based Technologies for Improving Cyber-Security in Petrochemical Applications 13-14 Nov, 2012 Eric Little, PhD Director, Information Management [email protected] 321-480-4818 Orbis Technologies, Inc. Proprietary 1 Steve Hamby Chief Technology Officer [email protected] 678-346-6386

Upload: others

Post on 17-Mar-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Cloud-based Technologies for Improving Cyber-Security in Petrochemical Applications 13-14 Nov, 2012

Eric Little, PhD Director, Information Management [email protected] 321-480-4818

Orbis Technologies, Inc. Proprietary 1

Steve Hamby Chief Technology Officer [email protected] 678-346-6386

Overview of Core Technologies for Scalable Cloud-based Semantics

RDF, RDFS, RDFa Ontologies Linked Data

Knowledge representation Artificial Intelligence

Rules/Inference Semantic Search

Hadoop MapReduce

Distributed Computing NoSQL

Virtualization Dynamic Provisioning

Natural Language Processing Entity Extraction

Information Extraction Information Retrieval Identity Resolution

Semantic Role Labeling Machine Learning

Information Extraction

CLOUD SEMANTICS

Semantic Annotation Ontology Population

Scalable Inference Distributed Triple Stores

Distributed Indexing

Text Mining

Information Assurance / Cyber

Security

Enterprise Architecture

Systems Engineering IP Portfolio

Mgt

TECHNOLOGY SERVICES

Orbis Technologies, Inc. Proprietary 2

Data Integration is Prevalent in Many Major Industries Today

• Cloud Systems

• Service Oriented Architectures (SOA)

• Semantic Integration

All provide powerful components to integrate, share and leverage one’s data.

• It is generally estimated that for each $1 spent for an application, companies spend on average $5 to $9 for the integration

Orbis Technologies, Inc. Proprietary 3

© IBM, Nelson Mattos

Mitigating the Hype Scale

• SOA is becoming more accepted (well along the slope of enlightenment)

• Semantics is slightly behind it (but at least out of the trough of disillusionment)

• Cloud is at the peak of its hype (it can still cure everything, fix all problems, fit every problem, etc.)

• Taken together one can anchor their architectures to more established technologies and temper the hype around new items

20-Nov-12 Orbis Technologies, Inc. Proprietary 4

SEMANTIC CLOUD APPLICATIONS FOR DATA INTEGRATION IN

PETROCHEMICAL INDUSTRIES

Orbis Technologies, Inc. Proprietary 5

Select Semantic Technologies Currently Used in Oil & Gas Applications

• Leveraging RDF/OWL and ISO 159263 Part 4 Reference Data Library

• The Norwegian Daily Production Report (DPR) Project

• The Active Knowledge Systems for Integrated Operations (AKSIO) Project

• The Integrated Information Platform (IIP) Project

• InfoWeb, a plant data specialist in the Netherlands, leverages the Semantic Web for developing the ISO 15926 knowledge base

• The Geosciences Network (GEON Grid) seismic infrastructure is a federation of ArcIMS (a server-based application for delivering dynamic maps and GIS data and services via the web)

20-Nov-12 Orbis Technologies, Inc. Proprietary 6

Improving Data Exchange Can Be Beneficial Across A Product’s Life Cycle

Geology Geo-Chem

Data

Refinement

Production Drill Stem

Test

Storage/

Transport

Reservoir

Modeling

• Having access to data

across the enterprise

removes stove-piping,

improves communication,

advances analytics, etc. –

semantics prove to be

valuable

• However, risks increase as

data is shared

• Providing security as part of

one’s IT fabric becomes

increasingly important

Assay

Building Semantic Profiles From Raw Data

Orbis Technologies, Inc. Proprietary 8

• Key data elements are

identified – creating lexicon of

important terms

• Data elements are categorized

into appropriate classes –

ranges are captured for

autoclassification

• Can be applied to crude

stocks, equipment, refineries,

products, processes, etc.

• Advanced logics allow for

reasoning over data sets such

that new patterns and

information can be gained

Federated Ontology Layers Allow for Advanced Data Modeling

Orbis Technologies, Inc. Proprietary 9

Classification Schemas to Reflect Subject Matter Expertise

Orbis Technologies, Inc. Proprietary 10

• SME knowledge can be captured in taxonomies – and extended to

more advanced ontology models (that contain attributes, advanced

relationships, etc.)

• Multiple ontologies can be integrated to capture enterprise-wide

applications for advanced business intelligence

Semantic Approach Improves Data Access

Traditional Approach Semantic Approach

Database Experts

Domain

Experts &

Scientists Systems

Engineers

Management

& Executives

• Manual Data Correlation

• Manual Report Generation

(High Potential for Error) • Integrated Classifications/Schemas

• Automated Reasoning Capabilities

(Significant Error Reduction)

Domain

Experts &

Scientists Systems

Engineers Management

& Executives 11

Ontology Engine

Orbis Technologies, Inc. Proprietary

Semantic Approach Simplifies Queries

Traditional Approach

Database Experts

Query Must Contain:

1. Data Requirements

2. All Logic Required to Relate the

Data (Rules, Joins, Decode, Sub-

queries, etc.)

Complexity: HIGH

Reusability: LOW-MED

Semantic Approach

Reasoning is done on the user side for each query

Reasoning is performed by Ontobroker within the system

Database Experts Scientists, Systems

Engineers Management

& Executives

Query Must Contain:

1. Data Requirements only

Complexity: LOW

Reusability: HIGH (Logic embedded in Model)

Orbis Technologies, Inc. Proprietary 12

Example of Enterprise-wide Data Integration Using Semantic Technologies

Orbis Technologies, Inc. Proprietary 13

Ability to Integrate Different Kinds of Relevant Data

Unstructured Textual Data

Structured Data

Customizable User Interfaces

Ontology Engine

BPM for Monitoring of Product Development, Personnel & Communications

Customizable Easy-to-Use Dashboard with Apps

Easy to Use Forms for Inputting of Tasks

Reports Are Auto-Generated

Email Notifications Allow for Improved Monitoring and Notification

Orbis Technologies, Inc. Proprietary 15

Utilizing Semantics for Integration of Data Across Multiple Sources

Orbis Technologies, Inc. Proprietary 16

Mobile Computing and Edge Devices Bringing the Power of the Enterprise to the User … When and Where Needed

• Mobile computing and edge devices can be more easily leveraged Mobile Dashboards: Provide dashboards to leadership on-the-go

Multi-touch devices: Excellent for audit command center; multiple tasks being worked at once with team collaboration

Orbis Technologies, Inc. Proprietary 17

Multi-touch device allows multiple people

to work on a single problem at once

Example

Mobile

Dashboard

USING SEMANTIC TECHNOLOGIES FOR IMPROVED SECURITY

Orbis Technologies, Inc. Proprietary 18

Security Risk Assessment

• Security Risk Assessment – is a combination of quantitative and qualitative measures that allow for proactive security measures. There are 2 major areas where energy companies are highly susceptible to

attacks:

1. Company Operations Systems - computers that route electricity, open valves and operate motors

2. Access to Proprietary Corporate Information - internal email communications, long-term development plans, new technologies, and investment information

There is no way to single-handedly protect an enterprise from cyber attacks

Instead, companies need to develop more accurate (and cost effective) means to combat cyber threats

• Performing a risk assessment of one’s security systems using advanced reasoners aided by semantic models can help reduce risks and leverage current IT investments in data integration.

Orbis Technologies, Inc. Proprietary 19

Enhanced Threat Modeling Using Semantics

• Utilizes advanced analytics to monitor social media, social networks, user data, network traffic

• Can improve visibility into threats by producing useful patterns of activities, groups, behaviors, etc. – information can then be integrated with other data using formal models for threats

Orbis Technologies, Inc. Proprietary 20

• Metadata associated with Twitter can provide useful information

• Entity extraction of text can go one step further and pick out specifics items of interest

Formal Ontology of Threat

• A viable threat exists as a tri-

partite whole, a potential threat

as a 2-part or 1-part whole.

• The 3 elements of a tri-partite

whole form necessary

dependence relations to one

another.

• Tri-partite wholes possess

integrated parts which are

inextricably related to one

another.

• Understanding the structure of

threat components can allow

for improved computational

approaches

Intent Capability

Opportunity

Intent Capability

Opportunity POTENTIAL

THREAT

VIABLE

THREAT

Next Gen IT Driving Innovation

• Semantic Web (i.e. Web 3.0) and Cloud based Security Enhancements:

Automated Entitlement Requests / Policy Management

Semantic Integration of Data and Applications With User Views and Security Roles

Semantic Filtering for Network Traffic

Semantic Analytics for Log Mining

Automated Entitlement Requests / Policy Management

• Compliance Concern

Addresses Segregation of Duties

• Extensible Semantic Layer Models Processes, Applications, Security, and Entitlement Request Rules

Decidable Model Automatically Detects Compliance Violations

Processes Can Be Invoked to Resolve Entitlement Request Compliance Issue

Rapid Prototype and Implementation (Months not Years) o “Mapping” to Security Systems Very Quick (limited systems)

o Modeling Processes and Correlating Applications Quick if Existing Business Process Models Exist in Standard Formats (BPMN, etc.)

20-Nov-12 Orbis Technologies, Inc. Proprietary 23

Semantic Integration and User-based Views

• Model-Driven Integration of Data Across All Data Sources

Including Legacy Systems and Unstructured Data Sources

• Capture Business Logic in Standards-based Declarative Models

Not Trapped in Code o Expensive and slow to modify

Not Lost With Retiring Users or Inconsistent Across Users

• Provides User Role and/or Attribute Based “Views”

Supports Provenance, Reification, Confidence, and Trusted Data

• Supports Compliance Reporting

Decidable Model Provides Trusted Intelligence

20-Nov-12 Orbis Technologies, Inc. Proprietary 24

Semantic Network Filtering

• Traditional Filtering is Often Text-Based

Not How Humans Think

• Semantic Models Enhance Network Filtering

Text Extraction (Event and Named Entity Recognition) to Semantic Threat Model and Filtering Based on Threat Match o Could also leverage Sentiment Analysis (e.g., disgruntled employee emails

or blogs from work)

Easily Applied to Emails (Working Implementations)

Extensible to Other Ports and Protocol Traffic o Can be based on User, External Entity, Protocol, Theme, etc.

20-Nov-12 Orbis Technologies, Inc. Proprietary 25

Semantic Analysis for Log Mining

• Enhances Audit Log Analysis

Significantly Reduces False Positives

Discovers Complex Insider Threat Behaviors

• Complements Most Existing (possibly) Security Information & Event Management (SIEM) tools

Plug-in Technology to Semantic Analytics

• Semantic Models for Threats, Events, Users

Decidable Models are Standards-based and Extensible

20-Nov-12 Orbis Technologies, Inc. Proprietary 26

CLOUD COMPUTING TO DRIVE SEMANTIC SYSTEMS AND IMPROVE

SECURITY

Orbis Technologies, Inc. Proprietary 27

Cloud Deployment “Use Cases” Different Types of Clouds for Different Problems

28

Source: Defense Information Systems Agency (DISA)

Evolution of DISA Distributed

Enterprise Computing Centers (DECC)

0 9

4

25 11

4

0

5

10

15

20

25

30

OnPremise

MSOnline

GoogleApps

Infrastructure

Subscription

Source: Forrester Research

Comparison of Costs for On-Premise

vs. Cloud-based Office Solutions

USD Costs Per

User Per Month

• Utility Cloud

Provisioning and managing large networks of virtual machines to provide on-demand computing resources, often accessible via APIs, that scale horizontally on standard hardware

• Storage Cloud

API-accessible storage for applications to access block devices or enterprise storage devices; provides backup, archiving, and data retention; can provide document storage

• Data Cloud

Analytic-driven, horizontally scalable processing of large amounts of data, rapidly changing data, complex data, and other “big data” sources

• Prototyping Cloud

Focused on rapid provisioning to support prototyping

Utility Cloud

• Provision / manage large networks of virtual machines

• Provide on-demand computing resources

• Often accessible via APIs Cloud brokers often abstract multiple APIs

• Scale horizontally on standard hardware

20-Nov-12 29

• Integrates and enables other cloud use types – data, prototype, storage, etc.

• Provides elastic computing capabilities

• APIs create IaaS capabilities for cloud and cloud service providers

• Component based – add new capability

• Highly available, fault tolerant, recovery

• Amazon EC2: Web service that provides resizable compute capacity in the cloud; ~500K Linux servers

• Openstack Compute: Developed by NASA; enables enterprises and service providers to offer on-demand computing resources

• Rackspace Cloud Servers: 190K clients; OpenStack

• VMware vCloud Suite: Integrated solution for building and managing a complete cloud infrastructure

• Apache Cloudstack: An open source cloud compute platform used to deliver IaaS; Citrix Cloud.com uses

• Microsoft Systems Center

Source: Amazon

Source: Rackspace

Storage Cloud

• API-accessible storage for applications to access block devices or enterprise storage devices

• Provide backup, archiving, and data retention

• Provide access to music, photos, calendars, contacts, documents, and other content from multiple devices

20-Nov-12 30

• Provides reliable data storage with fast data access

• APIs, often RESTful web service, provide document storage

• Flexible infrastructure to add new storage types

• Handle virtual machine images, photo storage, email storage and backup archiving inexpensively, through data replication and distribution across commodity hard drives

• Secure data upload/download and encryption of data at rest

• Amazon S3: Simple Storage Service; simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web; stores an estimated 500B objects

• Openstack Object Storage: Provides redundant, scalable object storage using clusters of standardized servers

• Rackspace CloudFiles: Commercial offering of Openstack

• Google Cloud Storage: RESTful API to store, access, and protect data

• Google Drive: File synchronization; 5GB per user

• Dropbox, Microsoft Skydrive, SugarSync

• Apple iCloud: Content synchronization across devices

Source: Amazon Source: Rackspace

Analytics for Data Clouds Driving Value from Data

• Some Types of Analytics Unstructured Text

Data Source Correlation

Data Efficacy

Entity Disambiguation

Relationship Identification

Trends

• Analytics Platforms and Data Repositories Apache Hadoop

Twitter STORM

Apache Cassandra

Apache HBase

Apache Accumulo

31

Apache Hadoop

• Open-source software for reliable, scalable, distributed computing

• Framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models

20-Nov-12 Orbis Technologies, Inc. Proprietary 32

• Key Capabilities: Hadoop Common: The common utilities that

support the other Hadoop modules

Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data

Hadoop YARN: A framework for job scheduling and cluster resource management

Hadoop MapReduce: A YARN-based system for parallel processing of large data sets

• Yahoo: 100,000 CPUs supporting Ad Systems and Web Search

• LinkedIn: ~19,760 cores, ~45TB RAM, and ~15PB discovering “People You May Know”

• Facebook: 11K cores and 15PB for analytics and ML over log and dimension data

• EBay: 4K cores and 5.3PB for search optim.

• AOL: ETL, statistics generation, behavioral analysis, and targeting

Twitter STORM

• Free and open source distributed real-time computation system

• Reliably processes unbounded streams of data in real-time

• STORM topology consumes streams of data and processes those streams in arbitrarily complex ways

20-Nov-12 Orbis Technologies, Inc. Proprietary 33

• Key Concepts: Nimbus: The STORM ‘master’

Topology: Analogous to a MapReduce job, except that STORM topology can run forever

Stream: A collection of named lists of values (input)

Spout: Feeds stream source into the STORM topology for processing

Bolt: Performs processing on stream data

Spout Bolt

Topology

Str

ea

ms

Apache Cassandra

• Open source distributed database management system; key-value store

• Designed to handle very large amounts of data spread out across many commodity servers

• Provides a highly available service with no single point of failure

20-Nov-12 Orbis Technologies, Inc. Proprietary 34

• Data Model: Column: Name, Value, and Timestamp (set by client)

SuperColumn: Column whose value is a column

ColumnFamily: A set of columns; referenced by rowkey

KeySpace: A namespace of a set of ColumnFamily

• Data Partitioning and Replication: RandomPartitioner: Equally partitioned w/ row key MD5 hash

ByteOrderedPartitioner: Row keys are stored in order of their raw value (think index on name)

ReplicationFactor: Number of nodes to replicate

Simple Strategy Uses Node Order for Replication

Network Topology Strategy Uses # of DataCenters

• Twitter: 100s of TBs; supporting social graph analysis and real-time analytics ;100K WPS

• Netflix: Benchmarked at 1M WPS on EC2; Logging, customer analytics, and other uses

• Several others in the 100GBs – 1+ TB range

• DataStax offers commercial support

• Facebook: Developed Cassandra, but abandoned in 2010 for performance issues

• Consistency and Conflict Resolution: ONE: Written to one node; read from first

QUORUM: Written to and read from [RF]/2+1

ALL: Read and write from all

Most recent timestamp wins conflict – client time is key

Atomicity is at ColumnFamily level

• Snitch maps IPs to Racks and Data Centers

• Writes are to CommitLog, then Memtable (CF) Flushing: Writing Memtable to Disk (SSTable) when full

Compaction: Merging SSTables

o Faster reads – Minimize number of SSTables to read from

o Reclaim unused space

Apache HBase

• Open-source, distributed, scalable, big data store Billions of rows X millions of columns

Supports data versioning

Column-oriented key-value store modeled after Google Bigtable

• Built on and uses Hadoop and HDFS

20-Nov-12 Orbis Technologies, Inc. Proprietary 35

• Column – ColumnFamily prefix : column name with value; limit to 2 – 3 CFs and names should be small

• Row – Key, Timestamp, Column

• Cell – Row, Column, Version

• Operations – Get, Scan, Put, Delete

• Flushing and Compaction by Region

• Schema Design is Dependent on Query

• Secondary Indexes and Alt. Query Paths Options

• Catalog: ROOT (META.regionkey) and META (list of regions)

• Adobe: Structured data processing with analytics, BI, and ML over images, video, flash, web, etc. … real-time, structured data … processing system that could handle any data volume, with access times under 50ms, with no downtime [or] data loss

• EBay: 4K cores, 5.3PB; search optimization

• eCircle: 2000 cores, 5TB RAM, 1PB storage; processing marketing data

• Facebook: Powers Messages (+135B / month)

• Twitter: R/W backup of all MySQL tables

Source: Sematext blog

Apache Accumulo

• Open-source, sorted, distributed key/value store

• Robust, scalable, high performance data storage and retrieval system designed after Bigtable

• Secure, labeled access at cell level

• Branched from US NSA Cloudbase

• Implemented on Hadoop and HDFS

20-Nov-12 Orbis Technologies, Inc. Proprietary 36

• Table and Tablet: Tables are partitioned into Tablets based on row key, so that all of a row’s columns can be retrieved at once; implements row-level transactions w/o locking

• TabletServer: Manages some subset of all the tablets (partitions of tables); only 1 TabletServer manages a tablet

• Write-ahead Logger: Accepts updates to Tablet servers and writes to local on-disk storage

• Garbage Collector: Identifies files that aren’t used by any process and deletes them

• Master: Detects and resolves TabletServer failure; manages load balancing; manages table operations; coordinates startup, shutdown, recovery

• Fault Tolerant Executor (FATE): eliminates SPoF

• Growing User Community: 42six, Accumulo Data, Archive.is, Berico, Booz Allen Hamilton, CyberPoint, Data Tactics, Eclectic Consulting, Invertix, KEYW, Orbis Technologies, PDI, Peterson Technologies, Potomac Fusion, Praxis, SAIC, sqrrl, SRA, SW Complete, Tetra Concepts, TexelTek

5-tuple key w/ cell visibility

Tablets and Table Partitioning Source: Adam Fuchs

Source: Adam Fuchs

Cloud and Cyber-Security Issues

• Security and Privacy

Enterprise identity management not the same in a public / community cloud

Physical security is responsibility of cloud provider

Application security and data privacy

Information Availability

• Compliance (e.g., HIPAA, SOX, etc.)

Continuity of Operations

Audit Logs

Data Jurisdiction o test

Record keeping / public records

• Legal or Contractual Issues

Liability for incidents involving data loss or compromise

Disposal of data at end / change of service

Intellectual property

20-Nov-12 Orbis Technologies, Inc. Proprietary 37

Cloud and Cyber-Security Benefits

• Single Security ‘Image’

Security ‘baked into’ every new instance

• Audit Log Processing is a Big Data Issue

• Homomorphic Encryption

A scheme believed to be useful to public clouds

May require cloud to fully implement

• Security Testing

Utility Cloud used to hack six-character implementation of the 160-bit SHA-1 crypto algorithm in 49 minutes at a cost of $2.10 (Nov 2010)

• Threat Model Enhancements

Big Data issue with need to scale based on threat

• Cloud Security Alliance

Member-driven organization chartered with promoting the use of best practices for providing information assurance within Cloud Computing

20-Nov-12 Orbis Technologies, Inc. Proprietary 38

Summary

• Cyber-security in today’s world requires a combination of technologies and accompanying services Required for improving security in large-scale environments

Orbis customer base are marquee DoD Customers and Fortune 50 Commercial Clients who are recognizing these benefits

Requires systems that provide fast results to showcase capabilities within an organization along with cost savings that leverage current investments in IT

• We argue that combining semantic capabilities within a cloud environment allows for improved Security Risk Assessment Provides improved modeling of threats (formal semantics)

Integrates large data sets to leverage the power of formal models

Provides a scalable and expandable cloud infrastructure to leverage various kinds of data in near-real time

Cyber security can become very nimble and respond to changing threats by proactively utilizing data to recognize new patterns that may have gone unnoticed by basic protection strategies

Orbis Technologies, Inc. Proprietary 39

THANK YOU

QUESTIONS?

40 Orbis Technologies, Inc. Proprietary