cloud-based technologies for improving cyber-security in
TRANSCRIPT
Cloud-based Technologies for Improving Cyber-Security in Petrochemical Applications 13-14 Nov, 2012
Eric Little, PhD Director, Information Management [email protected] 321-480-4818
Orbis Technologies, Inc. Proprietary 1
Steve Hamby Chief Technology Officer [email protected] 678-346-6386
Overview of Core Technologies for Scalable Cloud-based Semantics
RDF, RDFS, RDFa Ontologies Linked Data
Knowledge representation Artificial Intelligence
Rules/Inference Semantic Search
Hadoop MapReduce
Distributed Computing NoSQL
Virtualization Dynamic Provisioning
Natural Language Processing Entity Extraction
Information Extraction Information Retrieval Identity Resolution
Semantic Role Labeling Machine Learning
Information Extraction
CLOUD SEMANTICS
Semantic Annotation Ontology Population
Scalable Inference Distributed Triple Stores
Distributed Indexing
Text Mining
Information Assurance / Cyber
Security
Enterprise Architecture
Systems Engineering IP Portfolio
Mgt
TECHNOLOGY SERVICES
Orbis Technologies, Inc. Proprietary 2
Data Integration is Prevalent in Many Major Industries Today
• Cloud Systems
• Service Oriented Architectures (SOA)
• Semantic Integration
All provide powerful components to integrate, share and leverage one’s data.
• It is generally estimated that for each $1 spent for an application, companies spend on average $5 to $9 for the integration
Orbis Technologies, Inc. Proprietary 3
© IBM, Nelson Mattos
Mitigating the Hype Scale
• SOA is becoming more accepted (well along the slope of enlightenment)
• Semantics is slightly behind it (but at least out of the trough of disillusionment)
• Cloud is at the peak of its hype (it can still cure everything, fix all problems, fit every problem, etc.)
• Taken together one can anchor their architectures to more established technologies and temper the hype around new items
20-Nov-12 Orbis Technologies, Inc. Proprietary 4
SEMANTIC CLOUD APPLICATIONS FOR DATA INTEGRATION IN
PETROCHEMICAL INDUSTRIES
Orbis Technologies, Inc. Proprietary 5
Select Semantic Technologies Currently Used in Oil & Gas Applications
• Leveraging RDF/OWL and ISO 159263 Part 4 Reference Data Library
• The Norwegian Daily Production Report (DPR) Project
• The Active Knowledge Systems for Integrated Operations (AKSIO) Project
• The Integrated Information Platform (IIP) Project
• InfoWeb, a plant data specialist in the Netherlands, leverages the Semantic Web for developing the ISO 15926 knowledge base
• The Geosciences Network (GEON Grid) seismic infrastructure is a federation of ArcIMS (a server-based application for delivering dynamic maps and GIS data and services via the web)
20-Nov-12 Orbis Technologies, Inc. Proprietary 6
Improving Data Exchange Can Be Beneficial Across A Product’s Life Cycle
Geology Geo-Chem
Data
Refinement
Production Drill Stem
Test
Storage/
Transport
Reservoir
Modeling
• Having access to data
across the enterprise
removes stove-piping,
improves communication,
advances analytics, etc. –
semantics prove to be
valuable
• However, risks increase as
data is shared
• Providing security as part of
one’s IT fabric becomes
increasingly important
Assay
Building Semantic Profiles From Raw Data
Orbis Technologies, Inc. Proprietary 8
• Key data elements are
identified – creating lexicon of
important terms
• Data elements are categorized
into appropriate classes –
ranges are captured for
autoclassification
• Can be applied to crude
stocks, equipment, refineries,
products, processes, etc.
• Advanced logics allow for
reasoning over data sets such
that new patterns and
information can be gained
Classification Schemas to Reflect Subject Matter Expertise
Orbis Technologies, Inc. Proprietary 10
• SME knowledge can be captured in taxonomies – and extended to
more advanced ontology models (that contain attributes, advanced
relationships, etc.)
• Multiple ontologies can be integrated to capture enterprise-wide
applications for advanced business intelligence
Semantic Approach Improves Data Access
Traditional Approach Semantic Approach
Database Experts
Domain
Experts &
Scientists Systems
Engineers
Management
& Executives
• Manual Data Correlation
• Manual Report Generation
(High Potential for Error) • Integrated Classifications/Schemas
• Automated Reasoning Capabilities
(Significant Error Reduction)
Domain
Experts &
Scientists Systems
Engineers Management
& Executives 11
Ontology Engine
Orbis Technologies, Inc. Proprietary
Semantic Approach Simplifies Queries
Traditional Approach
Database Experts
Query Must Contain:
1. Data Requirements
2. All Logic Required to Relate the
Data (Rules, Joins, Decode, Sub-
queries, etc.)
Complexity: HIGH
Reusability: LOW-MED
Semantic Approach
Reasoning is done on the user side for each query
Reasoning is performed by Ontobroker within the system
Database Experts Scientists, Systems
Engineers Management
& Executives
Query Must Contain:
1. Data Requirements only
Complexity: LOW
Reusability: HIGH (Logic embedded in Model)
Orbis Technologies, Inc. Proprietary 12
Example of Enterprise-wide Data Integration Using Semantic Technologies
Orbis Technologies, Inc. Proprietary 13
Ability to Integrate Different Kinds of Relevant Data
Unstructured Textual Data
Structured Data
Customizable User Interfaces
Ontology Engine
BPM for Monitoring of Product Development, Personnel & Communications
Customizable Easy-to-Use Dashboard with Apps
Easy to Use Forms for Inputting of Tasks
Reports Are Auto-Generated
Email Notifications Allow for Improved Monitoring and Notification
Orbis Technologies, Inc. Proprietary 15
Utilizing Semantics for Integration of Data Across Multiple Sources
Orbis Technologies, Inc. Proprietary 16
Mobile Computing and Edge Devices Bringing the Power of the Enterprise to the User … When and Where Needed
• Mobile computing and edge devices can be more easily leveraged Mobile Dashboards: Provide dashboards to leadership on-the-go
Multi-touch devices: Excellent for audit command center; multiple tasks being worked at once with team collaboration
Orbis Technologies, Inc. Proprietary 17
Multi-touch device allows multiple people
to work on a single problem at once
Example
Mobile
Dashboard
Security Risk Assessment
• Security Risk Assessment – is a combination of quantitative and qualitative measures that allow for proactive security measures. There are 2 major areas where energy companies are highly susceptible to
attacks:
1. Company Operations Systems - computers that route electricity, open valves and operate motors
2. Access to Proprietary Corporate Information - internal email communications, long-term development plans, new technologies, and investment information
There is no way to single-handedly protect an enterprise from cyber attacks
Instead, companies need to develop more accurate (and cost effective) means to combat cyber threats
• Performing a risk assessment of one’s security systems using advanced reasoners aided by semantic models can help reduce risks and leverage current IT investments in data integration.
Orbis Technologies, Inc. Proprietary 19
Enhanced Threat Modeling Using Semantics
• Utilizes advanced analytics to monitor social media, social networks, user data, network traffic
• Can improve visibility into threats by producing useful patterns of activities, groups, behaviors, etc. – information can then be integrated with other data using formal models for threats
Orbis Technologies, Inc. Proprietary 20
• Metadata associated with Twitter can provide useful information
• Entity extraction of text can go one step further and pick out specifics items of interest
Formal Ontology of Threat
• A viable threat exists as a tri-
partite whole, a potential threat
as a 2-part or 1-part whole.
• The 3 elements of a tri-partite
whole form necessary
dependence relations to one
another.
• Tri-partite wholes possess
integrated parts which are
inextricably related to one
another.
• Understanding the structure of
threat components can allow
for improved computational
approaches
Intent Capability
Opportunity
Intent Capability
Opportunity POTENTIAL
THREAT
VIABLE
THREAT
Next Gen IT Driving Innovation
• Semantic Web (i.e. Web 3.0) and Cloud based Security Enhancements:
Automated Entitlement Requests / Policy Management
Semantic Integration of Data and Applications With User Views and Security Roles
Semantic Filtering for Network Traffic
Semantic Analytics for Log Mining
Automated Entitlement Requests / Policy Management
• Compliance Concern
Addresses Segregation of Duties
• Extensible Semantic Layer Models Processes, Applications, Security, and Entitlement Request Rules
Decidable Model Automatically Detects Compliance Violations
Processes Can Be Invoked to Resolve Entitlement Request Compliance Issue
Rapid Prototype and Implementation (Months not Years) o “Mapping” to Security Systems Very Quick (limited systems)
o Modeling Processes and Correlating Applications Quick if Existing Business Process Models Exist in Standard Formats (BPMN, etc.)
20-Nov-12 Orbis Technologies, Inc. Proprietary 23
Semantic Integration and User-based Views
• Model-Driven Integration of Data Across All Data Sources
Including Legacy Systems and Unstructured Data Sources
• Capture Business Logic in Standards-based Declarative Models
Not Trapped in Code o Expensive and slow to modify
Not Lost With Retiring Users or Inconsistent Across Users
• Provides User Role and/or Attribute Based “Views”
Supports Provenance, Reification, Confidence, and Trusted Data
• Supports Compliance Reporting
Decidable Model Provides Trusted Intelligence
20-Nov-12 Orbis Technologies, Inc. Proprietary 24
Semantic Network Filtering
• Traditional Filtering is Often Text-Based
Not How Humans Think
• Semantic Models Enhance Network Filtering
Text Extraction (Event and Named Entity Recognition) to Semantic Threat Model and Filtering Based on Threat Match o Could also leverage Sentiment Analysis (e.g., disgruntled employee emails
or blogs from work)
Easily Applied to Emails (Working Implementations)
Extensible to Other Ports and Protocol Traffic o Can be based on User, External Entity, Protocol, Theme, etc.
20-Nov-12 Orbis Technologies, Inc. Proprietary 25
Semantic Analysis for Log Mining
• Enhances Audit Log Analysis
Significantly Reduces False Positives
Discovers Complex Insider Threat Behaviors
• Complements Most Existing (possibly) Security Information & Event Management (SIEM) tools
Plug-in Technology to Semantic Analytics
• Semantic Models for Threats, Events, Users
Decidable Models are Standards-based and Extensible
20-Nov-12 Orbis Technologies, Inc. Proprietary 26
CLOUD COMPUTING TO DRIVE SEMANTIC SYSTEMS AND IMPROVE
SECURITY
Orbis Technologies, Inc. Proprietary 27
Cloud Deployment “Use Cases” Different Types of Clouds for Different Problems
28
Source: Defense Information Systems Agency (DISA)
Evolution of DISA Distributed
Enterprise Computing Centers (DECC)
0 9
4
25 11
4
0
5
10
15
20
25
30
OnPremise
MSOnline
GoogleApps
Infrastructure
Subscription
Source: Forrester Research
Comparison of Costs for On-Premise
vs. Cloud-based Office Solutions
USD Costs Per
User Per Month
• Utility Cloud
Provisioning and managing large networks of virtual machines to provide on-demand computing resources, often accessible via APIs, that scale horizontally on standard hardware
• Storage Cloud
API-accessible storage for applications to access block devices or enterprise storage devices; provides backup, archiving, and data retention; can provide document storage
• Data Cloud
Analytic-driven, horizontally scalable processing of large amounts of data, rapidly changing data, complex data, and other “big data” sources
• Prototyping Cloud
Focused on rapid provisioning to support prototyping
Utility Cloud
• Provision / manage large networks of virtual machines
• Provide on-demand computing resources
• Often accessible via APIs Cloud brokers often abstract multiple APIs
• Scale horizontally on standard hardware
20-Nov-12 29
• Integrates and enables other cloud use types – data, prototype, storage, etc.
• Provides elastic computing capabilities
• APIs create IaaS capabilities for cloud and cloud service providers
• Component based – add new capability
• Highly available, fault tolerant, recovery
• Amazon EC2: Web service that provides resizable compute capacity in the cloud; ~500K Linux servers
• Openstack Compute: Developed by NASA; enables enterprises and service providers to offer on-demand computing resources
• Rackspace Cloud Servers: 190K clients; OpenStack
• VMware vCloud Suite: Integrated solution for building and managing a complete cloud infrastructure
• Apache Cloudstack: An open source cloud compute platform used to deliver IaaS; Citrix Cloud.com uses
• Microsoft Systems Center
Source: Amazon
Source: Rackspace
Storage Cloud
• API-accessible storage for applications to access block devices or enterprise storage devices
• Provide backup, archiving, and data retention
• Provide access to music, photos, calendars, contacts, documents, and other content from multiple devices
20-Nov-12 30
• Provides reliable data storage with fast data access
• APIs, often RESTful web service, provide document storage
• Flexible infrastructure to add new storage types
• Handle virtual machine images, photo storage, email storage and backup archiving inexpensively, through data replication and distribution across commodity hard drives
• Secure data upload/download and encryption of data at rest
• Amazon S3: Simple Storage Service; simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web; stores an estimated 500B objects
• Openstack Object Storage: Provides redundant, scalable object storage using clusters of standardized servers
• Rackspace CloudFiles: Commercial offering of Openstack
• Google Cloud Storage: RESTful API to store, access, and protect data
• Google Drive: File synchronization; 5GB per user
• Dropbox, Microsoft Skydrive, SugarSync
• Apple iCloud: Content synchronization across devices
Source: Amazon Source: Rackspace
Analytics for Data Clouds Driving Value from Data
• Some Types of Analytics Unstructured Text
Data Source Correlation
Data Efficacy
Entity Disambiguation
Relationship Identification
Trends
• Analytics Platforms and Data Repositories Apache Hadoop
Twitter STORM
Apache Cassandra
Apache HBase
Apache Accumulo
31
Apache Hadoop
• Open-source software for reliable, scalable, distributed computing
• Framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models
20-Nov-12 Orbis Technologies, Inc. Proprietary 32
• Key Capabilities: Hadoop Common: The common utilities that
support the other Hadoop modules
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data
Hadoop YARN: A framework for job scheduling and cluster resource management
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets
• Yahoo: 100,000 CPUs supporting Ad Systems and Web Search
• LinkedIn: ~19,760 cores, ~45TB RAM, and ~15PB discovering “People You May Know”
• Facebook: 11K cores and 15PB for analytics and ML over log and dimension data
• EBay: 4K cores and 5.3PB for search optim.
• AOL: ETL, statistics generation, behavioral analysis, and targeting
Twitter STORM
• Free and open source distributed real-time computation system
• Reliably processes unbounded streams of data in real-time
• STORM topology consumes streams of data and processes those streams in arbitrarily complex ways
20-Nov-12 Orbis Technologies, Inc. Proprietary 33
• Key Concepts: Nimbus: The STORM ‘master’
Topology: Analogous to a MapReduce job, except that STORM topology can run forever
Stream: A collection of named lists of values (input)
Spout: Feeds stream source into the STORM topology for processing
Bolt: Performs processing on stream data
Spout Bolt
Topology
Str
ea
ms
Apache Cassandra
• Open source distributed database management system; key-value store
• Designed to handle very large amounts of data spread out across many commodity servers
• Provides a highly available service with no single point of failure
20-Nov-12 Orbis Technologies, Inc. Proprietary 34
• Data Model: Column: Name, Value, and Timestamp (set by client)
SuperColumn: Column whose value is a column
ColumnFamily: A set of columns; referenced by rowkey
KeySpace: A namespace of a set of ColumnFamily
• Data Partitioning and Replication: RandomPartitioner: Equally partitioned w/ row key MD5 hash
ByteOrderedPartitioner: Row keys are stored in order of their raw value (think index on name)
ReplicationFactor: Number of nodes to replicate
Simple Strategy Uses Node Order for Replication
Network Topology Strategy Uses # of DataCenters
• Twitter: 100s of TBs; supporting social graph analysis and real-time analytics ;100K WPS
• Netflix: Benchmarked at 1M WPS on EC2; Logging, customer analytics, and other uses
• Several others in the 100GBs – 1+ TB range
• DataStax offers commercial support
• Facebook: Developed Cassandra, but abandoned in 2010 for performance issues
• Consistency and Conflict Resolution: ONE: Written to one node; read from first
QUORUM: Written to and read from [RF]/2+1
ALL: Read and write from all
Most recent timestamp wins conflict – client time is key
Atomicity is at ColumnFamily level
• Snitch maps IPs to Racks and Data Centers
• Writes are to CommitLog, then Memtable (CF) Flushing: Writing Memtable to Disk (SSTable) when full
Compaction: Merging SSTables
o Faster reads – Minimize number of SSTables to read from
o Reclaim unused space
Apache HBase
• Open-source, distributed, scalable, big data store Billions of rows X millions of columns
Supports data versioning
Column-oriented key-value store modeled after Google Bigtable
• Built on and uses Hadoop and HDFS
20-Nov-12 Orbis Technologies, Inc. Proprietary 35
• Column – ColumnFamily prefix : column name with value; limit to 2 – 3 CFs and names should be small
• Row – Key, Timestamp, Column
• Cell – Row, Column, Version
• Operations – Get, Scan, Put, Delete
• Flushing and Compaction by Region
• Schema Design is Dependent on Query
• Secondary Indexes and Alt. Query Paths Options
• Catalog: ROOT (META.regionkey) and META (list of regions)
• Adobe: Structured data processing with analytics, BI, and ML over images, video, flash, web, etc. … real-time, structured data … processing system that could handle any data volume, with access times under 50ms, with no downtime [or] data loss
• EBay: 4K cores, 5.3PB; search optimization
• eCircle: 2000 cores, 5TB RAM, 1PB storage; processing marketing data
• Facebook: Powers Messages (+135B / month)
• Twitter: R/W backup of all MySQL tables
Source: Sematext blog
Apache Accumulo
• Open-source, sorted, distributed key/value store
• Robust, scalable, high performance data storage and retrieval system designed after Bigtable
• Secure, labeled access at cell level
• Branched from US NSA Cloudbase
• Implemented on Hadoop and HDFS
20-Nov-12 Orbis Technologies, Inc. Proprietary 36
• Table and Tablet: Tables are partitioned into Tablets based on row key, so that all of a row’s columns can be retrieved at once; implements row-level transactions w/o locking
• TabletServer: Manages some subset of all the tablets (partitions of tables); only 1 TabletServer manages a tablet
• Write-ahead Logger: Accepts updates to Tablet servers and writes to local on-disk storage
• Garbage Collector: Identifies files that aren’t used by any process and deletes them
• Master: Detects and resolves TabletServer failure; manages load balancing; manages table operations; coordinates startup, shutdown, recovery
• Fault Tolerant Executor (FATE): eliminates SPoF
• Growing User Community: 42six, Accumulo Data, Archive.is, Berico, Booz Allen Hamilton, CyberPoint, Data Tactics, Eclectic Consulting, Invertix, KEYW, Orbis Technologies, PDI, Peterson Technologies, Potomac Fusion, Praxis, SAIC, sqrrl, SRA, SW Complete, Tetra Concepts, TexelTek
5-tuple key w/ cell visibility
Tablets and Table Partitioning Source: Adam Fuchs
Source: Adam Fuchs
Cloud and Cyber-Security Issues
• Security and Privacy
Enterprise identity management not the same in a public / community cloud
Physical security is responsibility of cloud provider
Application security and data privacy
Information Availability
• Compliance (e.g., HIPAA, SOX, etc.)
Continuity of Operations
Audit Logs
Data Jurisdiction o test
Record keeping / public records
• Legal or Contractual Issues
Liability for incidents involving data loss or compromise
Disposal of data at end / change of service
Intellectual property
20-Nov-12 Orbis Technologies, Inc. Proprietary 37
Cloud and Cyber-Security Benefits
• Single Security ‘Image’
Security ‘baked into’ every new instance
• Audit Log Processing is a Big Data Issue
• Homomorphic Encryption
A scheme believed to be useful to public clouds
May require cloud to fully implement
• Security Testing
Utility Cloud used to hack six-character implementation of the 160-bit SHA-1 crypto algorithm in 49 minutes at a cost of $2.10 (Nov 2010)
• Threat Model Enhancements
Big Data issue with need to scale based on threat
• Cloud Security Alliance
Member-driven organization chartered with promoting the use of best practices for providing information assurance within Cloud Computing
20-Nov-12 Orbis Technologies, Inc. Proprietary 38
Summary
• Cyber-security in today’s world requires a combination of technologies and accompanying services Required for improving security in large-scale environments
Orbis customer base are marquee DoD Customers and Fortune 50 Commercial Clients who are recognizing these benefits
Requires systems that provide fast results to showcase capabilities within an organization along with cost savings that leverage current investments in IT
• We argue that combining semantic capabilities within a cloud environment allows for improved Security Risk Assessment Provides improved modeling of threats (formal semantics)
Integrates large data sets to leverage the power of formal models
Provides a scalable and expandable cloud infrastructure to leverage various kinds of data in near-real time
Cyber security can become very nimble and respond to changing threats by proactively utilizing data to recognize new patterns that may have gone unnoticed by basic protection strategies
Orbis Technologies, Inc. Proprietary 39