november 2013 hug: cyber security with hadoop
DESCRIPTION
TRANSCRIPT
Cyber Security Analytics & Big Data
Padmanabh Dabke, PhD
VP, Analytics & Visualization
Narus Inc.
2
Narus Confidential, © 2013 Narus, Inc.
• Company overview
• Narus Technology
• Key challenges & solutions
• Summary
Agenda
3
Company Overview
Innovative technology protected by a broadIP portfolio
• Cybersecurity & R&D software company based in Silicon Valley -Sunnyvale, CA
• Focused on fusing semantic and data planes, applying it to cybersecurity and risk management
• Making sense of physical, content, and social networksEstablished customer and
partner base
Wholly-owned subsidiary
of The Boeing Company
4
Journey to Cyber 3.0Semantic Web & Cyber Intersect
Web 1.0
Web 2.0
Web 3.0
Semantic WebAdds “Context” to data/Internet traffic based on a superior understanding of relationships within dataSocial Web
User/Community-generated content and the read-write web
Static WebPrimarily read-only content and static HTML websites
Cyber 1.0
Cyber 2.0
Siloed Cyber• Voluminous, homogenous information• Siloed, on demand non-interactive content• Limited number of applications and protocols• Resources and missions not fully aligned• Manual contextualization of data
Integrated Cyber• Voluminous high velocity data• Growth of applications and protocols• People connecting with each other & content• Human approaches to extract & contextualize• Variety of interactions driving looser control
(new threats)
Cyber 3.0
Intelligent Cyber• High volume, velocity and variety of data• Explosion in applications and protocols• Hyper connected people & content, interactivity
between, people machines-machines-people• Automated alignment of resources & missions• Machine learning for intelligence & context
Narus Confidential, © 2013 Narus, Inc.
5
1 10 100
Network Bandwidth in Gb
• Daily traffic volume 20.33 PB/Day (2015), 10+ PB/Day (2012)
• 2.5 devices / person, & 19 billion connections (2016)
• Fixed Line Speeds Growing from 10Gbps (2012) - 100Gbps (2015)
• ~1.5Million Applications in Android & Apple Store• Types of data: growing media (data, voice, video), protocols,
network types (local, cloud, virtualized, hybrid)
Changing LandscapeVolume, Velocity & Variety
Narus Confidential, © 2013 Narus, Inc.
6
Visibility, Context & Control: Key to Enhance Cybersecurity & Protect Assets
Control
• More efficient spending, faster resolution, dynamic approach to solve a dynamic problem
• Lots of tools, but policies not aligned with mission to allow tighter control
Context• Impact & root-cause is manual,
requires highly-skilled & paid analysts to digest overwhelming amounts of data
Visibility
• Need for continuous visibility into every dimension (hosts, users, etc.)
Narus Confidential, © 2013 Narus, Inc.
7
Narus’ Innovative TechnologyAn Integrated perspective
8
Narus nSystemComprehensive & Adaptive Analytics To Enhance Cybersecurity and Protect Critical Assets with Machine Learning
nAnalytics • Single UI with interactive dashboards offer multi-
dimensional views of cyber activity‒ Network, Semantic & User Analytics‒ Targeted Session Captures
• Advanced analytics for automated data fusion with machine learning
nProcessing • Centralized scalable data processing & storage
framework • Automated ability to deal with petabytes of data • Support for streaming, query-based and big-data
analytics• Machine learning applied to large volumes of data
nCapture• Architected for distribution at multiple sites & links
‒ 100% of packets examined, metadata with necessary session fidelity
• Plugins to assimilate data from heterogeneous sources• Precision targeted full packet capture• Support for 20G (duplex 10G) per-link, path to 100G
9Confidential / For Internal Use Only / © 2013 Narus, Inc.
HBase
Narus Analytics Framework
Real Time (< 5 sec latency)
Close Enough (5 min latency)
On Demand
Protocol Vector Creator
In-Memory AnalyticsReal Time
Visualization
Map Reduce Jobs
ETL
Hadoop RDBMSAd-Hoc/Sliding
Window Analytics
Real Time Analytics• Volumetric & Topical
Trends• Anomaly Detection• Classification• Clustering• Summarization
Data-At-Rest Analytics• Long term trends• Opportunistic
Correlations• Model Training
10
Machine Learning for Cyber Security
• Automated Signature Generation– Protocol Identification– Parser Generation– Mobile App Detection– App Categorization
• Text Analytics– Topic Detection– Sentiment Analysis
• Anomaly Detection– Baseline profile generation– Alerts & workflow – Malicious Application Detection
11
Key Challenges
• Increasing network traffic– Line speeds from 20 Gbps to 600 Gbps and above– 210 TB to 6.3 PB Per Day
• Diversity of deployments– Data rates, vertical application areas, SLA, price points:
everything is a variable• Operational issues
– Datacenter connectivity– Burstiness of network traffic
• Data Security
12
Lessons Learned/Solutions
• Extract and store all metadata and provide full packets as identified by the analyst– 90% reduction in data volume
• Use domain knowledge for message compression– Short codes for enumerated values (mobile apps, protocols, etc.)– Session associations to eliminate referential fields
• Hbase over HDFS – provides abstractions useful for modelling dynamic schema
• Off load CPU work to special purpose co-processors to accelerate performance
13
Lessons Learned/Solutions
• Relational databases are not evil– Believe it or not, relational algebra is quite powerful– We use it for fast, in-memory computations in combination with
Java code for processing rule sets• SQL interfaces on HDFS/Hbase are catching up
10 20 500
1
2
3
4
5
6
7
Analytics Data Store Performance
Big SQL Impala mySQL Cluster
mySQL Cluster
Impala
Big SQL
Database Size
Avg
. Que
ry P
roce
ssin
g T
ime
14
Business Considerations
• Optimizing Total Cost of Ownership (TCO)– System acquisition– Data center costs– Administration and maintenance
• Analytics development and skillset required• Global support
15
Data Warehousing Vs Hadoop
Source: “Big Data: What does it really cost?” By Winter Corp
16
Summary
• We blend network, semantic, and user-oriented views to create unique insights
– Data Loss Prevention– Threat Detection– Network pattern mining
• Real Time & At-Rest Analytics– Stateless analysis and short term trends and classification– At-Rest analysis for training models, opportunistic correlations, and
mega-trends• Hybrid Approach
– Hadoop/Hbase for horizontal scaling and cost-effective storage and processing of massive data sets
– Relational databases for creating efficient business intelligence views