predicting organization attacks via mining crowdsourcing data · 2018-09-12 · predicting attacks...

Predicting Organization Attacks via Mining Crowdsourcing Data

Neil Gong and Ratnesh Kumar

Iowa State University

Attack Prediction: Introduction, Motivation & Goal

• Vulnerability: a certain software bug– Enlisted in public vulnerability databases, e.g., CVE

• Exploit: a code that leverages a vulnerability– Not every vulnerability is exploitable

• Motivation: – Accurate and early on-time prediction enables preventive

actions before attacks

• Goal:– Predict whether a vulnerability is exploitable– Predict the exploit-time (day, week, month) of exploitable

vulnerability

Predicting attacksPredicting exploitable

Vulnerability andExpected Exploit Time

Existing Works

• Heuristics based approaches– FIRST’s Common Vulnerability Scoring System– Microsoft’s exploitability index– Adobe’s priority ratings

• Machine learning based approaches– Leverage either public vulnerability database OR social

media data stream, but not BOTH– Rely on conventional machine learning classifier, e.g., SVM

• Limitations:– inaccurate (many false +ve/-ve)– insecure to fake social-media data

Our Proposed Work

• Leverage both public vulnerability databases AND social-media data

• Detect and filter fake social-media data

• Leverage deep learning based classifier (as opposed to Support Vector Machines) for accuracy

Our Framework – Learning Phase

Vulnerabilities from CVE

Twitter Fake-data filter

Vulnerabilityrelatedtweets

Feature extractor

Deeplearningengine

Classifier forexploitability

prediction

Classifier forexploit-timeprediction

Groundtruth from multiple sources

Features from CVE

• Bag-of-words features from the text of a vunerability

An example vulnerability in CVE

Features from Twitter• Tweets about a vulnerability with ID CVE-2016-3298

• Features:⁻ Bag-of-words features from tweets⁻ # users tweet the vulnerability⁻ #retweets⁻ …..

• Each vulnerability represented as a vector

– Followed by feature selection/dimension reduction

Social Graph based Fake User Detection

Normal Fake

?

??

?

?

??

?

?

Known normal users Known fake users

Sparse connections

• Detect and filter fake users using graph analytics

– Key observation: normal users are unlikely to connect to fake users

Groundtruth from Symantec

• Historical info about whether a vulnerability is exploited and when

Deep Learning Classifier for Exploitability and Exploit time Prediction

• Deep learning is found superior for many machine learning tasks

• Input layer receives feature vector for each vulnerability

• Network weights are adjusted so outputs match corresponding groundtruths

Our Framework – Prediction Phase

Onevulnerability

from CVE

Twitter Fake-data filter

Feature extractor

Classifier forexploitability

prediction

Classifier forexploit-timeprediction

Tweet about the vulnerability

Exploitable?When?

Summary: Major Tasks and Schedule• Task 1 (0-3 months): Collect vulnerabilities from

CVE, related tweets from Twitter, and groundtruth from Symantec

• Task 2 (3-6 months): Design and evaluate a method to detect fake users in Twitter

• Task 2 (6-9 months): Design predictive features and train deep learning classifiers

• Task 4 (9-12 months): Evaluate and refine the fake user filter and the classifiers

Selected LiteratureFeature Selection• Guyon, Isabelle, and André Elisseeff. "An introduction to variable and feature

selection." Journal of machine learning research 3.Mar (2003): 1157-1182• Information gain for feature selection.

https://en.wikipedia.org/wiki/Information_gain_in_decision_trees• Michalski, Ryszard S., Jaime G. Carbonell, and Tom M. Mitchell, eds. Machine

learning: An artificial intelligence approach. Springer Science & Business Media, 2013.

• PCA.https://en.wikipedia.org/wiki/Principal_component_analysis

Learning Deep Neural Networks• Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning

algorithm for deep belief nets." Neural computation 18.7 (2006): 1527-1554.• Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification

with deep convolutional neural networks." Advances in neural information processing systems. 2012.

• TensorFlow. https://www.tensorflow.org/

Fake-User Detection• Neil Zhenqiang Gong, Mario Frank, Prateek Mittal. “SybilBelief: A Semi-

supervised Learning Approach for Structure-based Sybil Detection”. In IEEE Transactions on Information Forensics and Security (TIFS), 9(6), 2014.

https://en.wikipedia.org/wiki/Information_gain_in_decision_trees

https://en.wikipedia.org/wiki/Principal_component_analysis

https://www.tensorflow.org/

predicting organization attacks via mining crowdsourcing data · 2018-09-12 · predicting attacks...

Documents