october 2014 webinar: cybersecurity threat detection

25
Securely explore your data CYBERSECURITY THREAT DETECTION Deriving Insights with Sqrrl and Spark GraphX Adam Fuchs, CTO October 2014

Upload: sqrrl

Post on 10-Aug-2015

50 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Securely explore your data

CYBERSECURITY THREAT DETECTION

Deriving Insights with Sqrrl and Spark GraphX Adam Fuchs, CTO October 2014

WHO WE ARE

2 © 2014 Sqrrl Data, Inc. | All Rights Reserved

WHAT WE’LL DISCUSS

3 © 2014 Sqrrl Data, Inc. | All Rights Reserved

•  Security Analytics using (Big) Cybersecurity Data •  You’ve been breached – what’s at stake?

•  Dealing with the new security dilemma •  The ‘Linked Data’ Approach

•  Case study: internal network breach •  Overview of scenario

•  Data modeling with Sqrrl

•  Detecting anomalies with Sqrrl and GraphX •  Visual, contextual research and analysis

THE NUMBERS DON’T LIE

© 2014 Sqrrl Data, Inc. | All Rights Reserved | Proprietary and Confidential 4

229 87%

90% $12.7M Source: Mandiant Source: Verizon

Source: Verizon Source: Ponemon

TARGETED ATTACKS HAVE CHANGED THE GAME

5 © 2014 Sqrrl Data, Inc. | All Rights Reserved Source: Battery Ventures

WHAT DOES THIS MEAN FOR US?

•  You’ve been breached. Deal with it.

•  Empower the investigator

•  Research and respond: better, faster, smarter

•  It’s all about speed to understanding

© 2014 Sqrrl Data, Inc. | All Rights Reserved 6

Dissolution of the secure perimeter

© 2014 Sqrrl Data, Inc. | All Rights Reserved 7

Detecting attacks requires more (i.e. BIG) data

But your tools can’t handle the big data wave

So attackers are spilling in

THE SECURITY DATA DILEMMA

BIG DATA TRANSFORMED

© 2014 Sqrrl Data, Inc. | All Rights Reserved

Linked Contextual Knowledge

Perimeter Data

Network Data

Endpoint Data

Security Data

VPN FW

Network Data

Proxy NetFlow

Application Data

HR USB

Email

Users

Websites Internal Servers

Client Devices Assets

Analysis

Search

Exploration

Reports

Anomalies

Data sources

Machine Learning

8

ARCHITECTURAL OVERVIEW

© 2014 Sqrrl Data, Inc. | All Rights Reserved 9

Commodity Hardware

HDFS + Accumulo

Raw Events Entity/Relationship Model

Query Engine Bulk/Graph Processing

Visualization / API ML + Anomaly Detection

Physical

Data Storage

Data Model

Processing

Interface

Audit

Cryptography

Labeling + P

olicy

Security

CASE STUDY: COMPROMISED NETWORK

© 2014 Sqrrl Data, Inc. | All Rights Reserved 10

BREACH DETECTION SCENARIO

© 2014 Sqrrl Data, Inc. | All Rights Reserved

BREACH Compromised Laptop

NETFLOW:

NETWORK SCAN WINDOWS EVENT LOGS:

PASS THE HASH NETFLOW:

EXFIL

STOLEN CREDENTIALS WINDOWS EVENT LOGS: Unusually excessive logins

DB DUMP MSSQL EVENT LOG: Unscheduled backup

i

RECON / DELIVERY EXPLOIT / INSTALL C2 / ACTION

p a

W q

mins hours days weeks months

11

CASE STUDY MODEL

© 2014 Sqrrl Data, Inc. | All Rights Reserved 12

Data Sources

Users

Hosts

login

Linked Meta Model

flow

login

DNS records

Netflow

Host logs

Database logs

External Alerts

CASE STUDY EXAMPLE MAPPING

© 2014 Sqrrl Data, Inc. | All Rights Reserved 13

Netflow Records

startTime endTime sourceIP destIP source

Port destPort protocol tcpFlags bytesIn bytesOut

10/22/14 8:58  

10/22/14 8:58   10.0.2.15   192.168.0.123   37051   139   TCP   ...RS.   100   3355  

10/22/14 8:45  

10/22/14 8:45   10.0.2.15   192.168.0.6   0   3328   ICMP   ......   40   100  

10/22/14 8:59  

10/22/14 8:59  

192.168.0.119   10.0.2.15   139   60071   TCP   .A..S.   46   351  

10.0.2.15

192.168.0.123

Class=Flow, totalBytes = 3455

192.168.0.6

Class=Flow, totalBytes = 140

CASE STUDY EXAMPLE DATA

© 2014 Sqrrl Data, Inc. | All Rights Reserved 14

Jane

Class=User: id=Jane,

loginAttempts=82

192.168.10.94 login

74.129.94.19

Class=Host: id=74.129.94.19,

bytesTransfered={2014-09-30/01:00:

64472381}

Class=Host: id=192.168.10.94,

hostname=kali, bytesTransfered={2014-09-30/01:00:

64472381}

flow

192.168.10.120

Class=Host: id=192.168.10.120, hostname=msserv bytesTransfered=

{2014-09-30/04:00: 42318}

INVESTIGATION PROCESS

© 2014 Sqrrl Data, Inc. | All Rights Reserved 15

1. Set the Stage 2. Enable Search

and Discovery 3. Automate

Analysis

•  Define the security-centric entity/relationship model

•  Extract and maintain the model

•  Visually navigate assets and actors in the network

•  Drill down to the raw data seeding the model

•  Use behavioral analytics to build expectations of ‘normal’

•  Flag entities as potentially ‘abnormal’ and sniff them out

SPARK METHODOLOGY, ALGORITHMS, AND RESULTS

© 2014 Sqrrl Data, Inc. | All Rights Reserved 16

APACHE SPARK 101 We use Spark because: 1.  Meets core processing

requirements •  Pre-canned algorithms •  Native support for graph

processing •  Simple programmability

2.  Good performance •  Low latency for many small

jobs •  Scalability for big jobs

3.  Natural fit •  Ties with Hadoop ecosystem

simplified integration

© 2014 Sqrrl Data, Inc. | All Rights Reserved 17

ROUND-TRIPPING WITH SPARK

© 2014 Sqrrl Data, Inc. | All Rights Reserved 18

Algorithmic Enrichment

SqrrlGraphInputFormat SqrrlGraph.update(uuid, values)

Sqrrl Graph Store

Input Data

Ingest/ Extract

Serve/Analyze

Sqrrl UI •  DNS •  Netflow •  Windows

Logs •  DB logs •  Alert data

STRUCTURAL FEATURES

© 2014 Sqrrl Data, Inc. | All Rights Reserved 19

Triangle Counting: •  Given node A, find edges AB, AC, BC •  For nodes B, C in A’s neighborhood, is

P(BC) > E/N2

Node Degree: •  Given node A, how many nodes

within 1 or 2 edges?

Page Rank: •  Iteratively transfer weight

proportionally to neighbors •  Converges on entity importance

SPARK OUTLIER DETECTION

•  Use GraphX to load Sqrrl graph model •  Entities: Users, Hosts

•  Relationships: Flows, Logins (both user and host) •  Loads an RDD with Sqrrl graph in Spark

•  For every node, generate features: •  GraphX built-in methods:

•  Degree, Triangle Count, PageRank

•  Implemented in Spark by Sqrrl: •  edgeWeightTotal, totalNeighborDegree

© 2014 Sqrrl Data, Inc. | All Rights Reserved 20

Detail on data flow and algorithms

SPARK OUTLIER DETECTION

•  Transform statistics to feature matrix, run PCA •  Creates ranked list of high-variance dimensions, most

likely indicative of an entity’s “outlierness” •  PCA run with Spark MLLib

•  Top feature pairs: •  totalNeighborDegree vs. edgeWeightTotal •  Degree vs. edgeWeightTotal

•  Create “distance” measure using pairs to flag anomalies

© 2014 Sqrrl Data, Inc. | All Rights Reserved 21

Detail on data flow and algorithms

SPARK RAPID ITERATION

© 2014 Sqrrl Data, Inc. | All Rights Reserved 22

VISUALIZING THE THREAT

© 2014 Sqrrl Data, Inc. | All Rights Reserved 23

© 2014 Sqrrl Data, Inc. | All Rights Reserved 24

THANKS!

© 2014 Sqrrl Data, Inc. | All Rights Reserved 25

Adam Fuchs, CTO Sqrrl Data, Inc.

http://www.sqrrl.com