data science in cybersecurity€¦ · © vectra | vectra.ai 3 attacker behaviors: unifying data...

47
ALLAN OGWANG Data Science in Cybersecurity

Upload: others

Post on 02-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

1© Vectra | vectra.ai

A L L A N O G W A N G

Data Science in Cybersecurity

Page 2: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

2© Vectra | vectra.ai

How Vectra applies data science for threat detection

Vectra uses AI to detect attackers in real time and enrich threat investigations with a conclusive chain of forensic evidence

Page 3: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

3© Vectra | vectra.ai

Attacker behaviors: unifying data science and security research

Security Research• Identify, prioritize, and

characterize fundamental attacker behaviors

• Validate models

Data Science• Determine best approach to

identify behavior• Develop and tune models

Attacker behavior models• High-fidelity detection of

things attackers must do• No signatures: find known

and unknown

Page 4: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

4© Vectra | vectra.ai

Who is Vectra AI?

Vectra AI provides automated threat detection to expose hidden and unknown cyberattackers in a network.Apply artificial intelligence to seek out the fundamental threat behaviors that attackers simply can't avoid

Page 5: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

5© Vectra | vectra.ai

Cyberthreats in an enterprise:An advanced attack

Page 6: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

6© Vectra | vectra.ai

Enterprise networks

Firewall creates a separation between inside and outside of

the network

Organization firewall

Page 7: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

7© Vectra | vectra.ai

Enterprise networks

Firewall prevents an attacker from connecting to

network computers

Organization firewall

Page 8: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

8© Vectra | vectra.ai

Advanced attack

Attacker needs a footprint inside the

network

Organization firewall

Page 9: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

9© Vectra | vectra.ai

Advanced attack

Infect with malware

Organization firewall

Page 10: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

10© Vectra | vectra.ai

Advanced attack

Malware connects to server hosted by

attacker

Organization firewall

Page 11: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

11© Vectra | vectra.ai

Advanced attack

Command-and-control behaviors

Organization firewall

Page 12: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

12© Vectra | vectra.ai

Advanced attack

Reconnaissance behaviors

Organization firewall

Page 13: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

13© Vectra | vectra.ai

Advanced attack

Reconnaissancebehaviors

Organization firewall

Page 14: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

14© Vectra | vectra.ai

Advanced attack

Lateral movement behaviors

Organization firewall

Page 15: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

15© Vectra | vectra.ai

Advanced attack

Exfiltration behaviors

Organization firewall

Page 16: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

16© Vectra | vectra.ai

ReconnaissanceCommand and Control

Progression of an attack

Lateral Movement Data Exfiltration

Initial infection

Page 17: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

17© Vectra | vectra.ai

Attack

Data

Machine learning

Page 18: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

18© Vectra | vectra.ai

Different types of learning: Supervised vs. unsupervised

SUPE

RVI

SED

UN

SUPE

RVI

SED

SVM

Decision Tree Random

Forest

IsolationForest

One-Class SVM

K-Means

GMM

DBSCAN

Neural Networks

Network Embeddings

Deep Autoencoder

RBMPerceptron

SHALLOW

Logistic RegressionKNN

ARTARTMAP

MDN

PCA

HMM

Naïve Bayes

RBE

Deep Neural Network

DBN

Neural networks

Deepneural networks Deep autoencoder

ART

Page 19: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

19© Vectra | vectra.ai

The “no-free-lunch” theorem

No single algorithm performs best for all problems

Type of problem

Highly specialized algorithmPe

rform

ance

General purpose algorithm

Page 20: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

20© Vectra | vectra.ai

Choosing the right algorithm: Know your data

Concentric Rings

Nested Arches

Loose Clusters

Elongated Clusters

Tight Clusters

No Clusters, Data Field

K-M

eans

Affin

ity

Mea

n Sh

iftSp

ectra

l

War

d

Agglo

mer

ative

DBSC

AN

Birc

h

GMM

Clu

ster

ing

Prob

lem

s

Page 21: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

21© Vectra | vectra.ai

Choosing the right algorithm: Know your data

Concentric Rings

Nested Arches

Loose Clusters

Elongated Clusters

Tight Clusters

No Clusters, Data Field

K-M

eans

Affin

ity

Mea

n Sh

iftSp

ectra

l

War

d

Agglo

mer

ative

DBSC

AN

Birc

h

GMM

Clu

ster

ing

Prob

lem

s

Page 22: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

22© Vectra | vectra.ai

Choosing the right algorithm: Know your data

Concentric Rings

Nested Arches

Loose Clusters

Elongated Clusters

Tight Clusters

No Clusters, Data Field

K-M

eans

Affin

ity

Mea

n Sh

iftSp

ectra

l

War

d

Agglo

mer

ative

DBSC

AN

Birc

h

GMM

Clu

ster

ing

Prob

lem

s

Page 23: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

23© Vectra | vectra.ai

Choosing the right algorithm: Know your data

Concentric Rings

Nested Arches

Loose Clusters

Elongated Clusters

Tight Clusters

No Clusters, Data Field

K-M

eans

Affin

ity

Mea

n Sh

iftSp

ectra

l

War

d

Agglo

mer

ative

DBSC

AN

Birc

h

GMM

Clu

ster

ing

Prob

lem

s

Page 24: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

24© Vectra | vectra.ai

Choosing the right algorithm

No single algorithm performs best for all problemsSelect the right option for your data and performance needs

Type of problem

Highly-specialized algorithmPe

rform

ance

General-purpose algorithm

Page 25: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

25© Vectra | vectra.ai

Outline

• Metadata used for threat detection• Approach to detection

• Detecting Remote Access Trojans (RATs)• Signatures• Anomaly detection• Random forest• Deep learning

• Conclusions

Page 26: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

26© Vectra | vectra.ai

Outline

• Metadata used for threat detection• Vectra’s approach to detection

• Detecting Remote Access Trojans (RATs)• Signatures• Anomaly detection• Random forest• Deep learning

• Conclusion

Page 27: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

27© Vectra | vectra.ai

Metadata hits the sweet spot for security applications

Vectra metadata designed with attacker behavior in mindAll detection models are based on Vectra metadata• Metadata includes bytes, protocols, domains, ips• Other advanced models are based off enhanced metadata

NetFlowVectra

metadata stream

Full packet capture

Increasing data volume with increasing deployment complexity

Page 28: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

28© Vectra | vectra.ai

Example of enhanced metadata: Beaconing behavior

Beaconing behavior is a common sign of a command and control channelWhether a host is beaconing must be inferred based on the host behaviorBy applying machine learning to this raw Vectra metadata we can identify beaconing behaviorHTTP/S tunnel model was developed using this data to help identify command and control channels

Page 29: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

29© Vectra | vectra.ai

Outline

• Metadata used for threat detection• Approach to threat detection

• Detecting Remote Access Trojans (RATs)• Signatures• Anomaly detection• Random forest• Deep learning

• Conclusion

Page 30: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

30© Vectra | vectra.ai

Remote Access Trojans (aka external remote access)Attacker wants to establish manual control over asset inside the network

Firewalls block most inbound connection attempts

ExamplesBlackshadesPoison IvyNOPEN (Shadow Brokers)WebExTeamViewer

So compromised internal asset calls out to “meeting point” and attacker takes overFW

Page 31: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

31© Vectra | vectra.ai

Network Signatures

Network• URLS, User Agents, Payloads, Domains, IP Addresses, etc

trojan.rules:alerttcp$HOME_NETany->$EXTERNAL_NETany(msg:"ETTROJANDarkComet-RATserverjoinacknowledgement";flow:to_server,established;dsize:12;content:"|393441354144304145463639|";flowbits:isset,ET.DarkCometJoin;reference:url,www.darkcometrat.com; reference:url,anubis.iseclab.org/?action=result&task_id=1a7326f61fef1ecb4ed4fbf3de3f3b8cb&format=txt;classtype:trojan-activity;sid:2013284;rev:3;metadata:created_at2011_07_18,updated_at2011_07_18;)

Great for known threats• Easily bypassed with changes to the malware• Lags behind new changes in malware

Based on known patterns flag known RATs

Page 32: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

32© Vectra | vectra.ai

Anomaly detectionUnsupervised• Assume a RAT

• Uncommonly used port• Uncommon destination• Uncommon hour

Everything is “uncommon”New ports everydayNew domains everydayTime is not a great signalWill likely alert you to the event• But how do you find true event in

this haystack?

Page 33: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

33© Vectra | vectra.ai

Data is king – How Vectra sees RATs

A RAT is not static• All behavior happens in time

• Commands are issued • Information is received

Incremental flow between a RAT server and client host

Time

Byte

s

received send

Page 34: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

34© Vectra | vectra.ai

Machine learning first pass – Random forest

A random forest is a collection of decision treesNot likely a single perfect decision tree model• Randomly look at features• Randomly look at data• Build several models

Each model votes • Every model does not need to be right• But more that vote more confidence in

decision

Page 35: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

35© Vectra | vectra.ai

Random forest for RATs

Featurize the timeseries window –20+ Features• Data and packet client / server ratios• Consistency of the client / server data• Frequency where the server breaks

silence• Total session length• Entropy of the session• etc…

Observe multiple windows and trigger on convergence

Model provided value• Alerted on large % of known RATs but

not all• Did not trigger on all known RAT

behaviorsIssues• Did not properly represent the

temporal nature• One sequence impacts the next

• Human driven features missed behaviors

• Can guess and test but can never be sure

Page 36: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

36© Vectra | vectra.ai

Deep learning

Digit labels0,1,2,3,4,5,6,7,8,9

Phonemesdh, aw, s, ax, n, d, …

Mouse Movements(right, left, up, down)

Page 37: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

37© Vectra | vectra.ai

Deep learning: Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM)

RNN

• Similar to feedforward NN

• Recurrent connections == Memory

Page 38: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

38© Vectra | vectra.ai

Deep Learning: Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM)

LSTM

• Similar to RNNs

• Replace simple neurons with LSTM blocks

• Prevents “vanishing gradient” problem

• Capable of learning long-range temporal dependencies

Page 39: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

39© Vectra | vectra.ai

Deep learning:Model training strategy

Model trainingFramework: TensorFlowModel: RNN (LSTM cell)Train on AWS w/ NVIDIA v100 GPUs

Time

Byte

s

received send

Rat / Not Rat

Page 40: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

40© Vectra | vectra.ai

Deep learning:Learning representations

Map input time-series to embedded representationClassify the embedding as RAT / not RATObserve convergence in classificationReport behavior

Time

Byte

s

received send

Page 41: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

41© Vectra | vectra.ai

Page 42: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

42© Vectra | vectra.ai

Outline

• Metadata used for threat detection• Approach to threat detection

• Detecting Remote Access Trojans (RATs)• Signatures• Anomaly detection• Random forest• Deep learning

• Conclusion

Page 43: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

43© Vectra | vectra.ai

Know your model

In security, the problems are various and complex; data are sometimes unavailable, sometimes imbalanced

Many approaches are available, but not all will perform equally well

No free lunch! Understand the problem and choose the right model• Supervised or unsupervised?• Classification or regression?• Temporal factors are crucial

Data science is not just about math. Attackers can only be detected through conjunction of deep knowledge of machine learning and security

Page 44: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

44© Vectra | vectra.ai

Data science – first as an art, then apply the science

Page 45: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

45© Vectra | vectra.ai

Page 46: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

46© Vectra | vectra.ai

Collect advanced

attack samples

Come up with advanced

attacks

Abstract the behavior and form a theory

Collect positive and negative

samples

Extract features out of the samples

Work the theory on

offline data

Refine into detection

model

Deploy and test on live

dataReview results Design UI Develop UI Put detection

into production

Check efficacy; improve where

necessary

Improve and redeploy Improve and redeploy

Security researchers Security Researchers + Data Scientists

Product Designer Developers

Detection lifecycle

Page 47: Data Science in Cybersecurity€¦ · © Vectra | vectra.ai 3 Attacker behaviors: unifying data science and security research Security Research • Identify, prioritize, and characterize

47© Vectra | vectra.ai

Model Development Philosophy – Research to Production

1. Report an advanced attack behavior• Methodology and data sources are irrelevant

2. Provides the relevant context to investigate• Necessary information for rapid validation

3. Improvable over time• Trackable efficacy

4. Minimal noise and high coverage• Meets initial recall and precision requirements