data-driven cyber security to counterfeit malicious attacks · data-driven cyber security to...

Data-driven Cyber Security to Counterfeit Malicious Attacks

Yang Xiang

Swinburne University of Technology, Australia

yxiang@swin.edu.au

Cybersecurity Lab Core Capabilities

• FinTech and blockchain• Risk and decision making• Trustworthiness• Data privacy• Spam detection

Applicationsecurity

• Security analytics• Threat prediction• Machine learning for cyber• Social networks security• Insider attacks detection

Data security

• Network, SDN, NFV security• Cloud security• CPS/IoT security• Ransomware/Malware• Autonomous security

System securityHar

Real-world DataSecurityModellingReasoning

Research Methodology

Data-driven Cyber

Security

Cyber threat

analysis

Model security problem

Data collection

Machine learning

customization

Examples

Data-driven Cyber

Security

Software vulnerability

detection

ML-based malware detection

Twitter spam

detection

Network traffic

classification

Software vulnerability detection

500,000ServersAffected

MillionsServers

Attacked

150CountriesAffected

$4Billion

7~8%CPU Loss

IntelSGX

$xxx Loss

Challenge

1Software

Complexity

45million

61million

70million

100+million lines

Challenge

2Vulnerability

Numbers

14,714

20,000+54+/day

Challenge

3Lackof Data

Efficiency

Effectiveness

Securityconsiderati

on notprioritised

Insufficientresources

Lack oflabelled

Lack ofdatasets

Labour-intensive feature

engineering

Insufficientsecurity

knowledge

Scalability

Observations

• Abstract Syntax Trees (ASTs): an effective code representations.

• Software source code shares similar statistical properties to natural language.

• Vulnerabilities from different projects share common knowledge, which is discoverable by deep learning algorisms.

Representations learning

The input Low-level features Mid-level features High-level features

Latent, abstract features describing programming patterns/characteristics

Methodology

Network Architecture

Feature Engineering

ML Algorithms

Evaluations

Taxonomy – Our Work

Source code

Binary / Assembly

Pattern-based

Text-based

Code Properties

Trees – Abstract Syntax Tree (AST)

Graphs

Function Call Graphs

Data Flow Graphs

Control Flow Graphs

Dependency Graphs

Program SliceCode Gadgets

Imports/API calls

Rules / Templates

Bag-of-words

Word2Vec / FastText / Code2Vec…

-- Code metrics

Logistic Regression

Random Forest

Markov model….

Conventional

Deep belief network

Deep learning

OthersGenetic Algorithm --

Accuracy

Efficiency

Detection Granularity

Precision

Recall

F-measureDetection Performance

Top-k precision/recall

The Datasets

457vulnerablefunctions

32,531non-

vulnerablefunctions

6open-source

projects

1,000+releases

NVDCVE

repositories

Results

Binary Vulnerability Detection

Future Work

Binary-level

detection

Instruction-level

granularity

Specific-typevulnerability

detection

Focusing on scenarios where the source code is unavailable

Identifying multiple instructions (reverse-engineering) that are

potentially vulnerable

Focusing on vulnerabilities causedby missing checks (e.g. numeric

errors).

Example 2 - ML-based malware detection

Example 3 – Twitter spam detection

Example 4 - Network traffic classification

Research Methodology

Collect data for security

problem

Extract raw or low level

features

Apply data analysis

Security professionals Domain knowledge Model analytics

Data-driven Cyber Security

Resources

• G. Lin, J. Zhang, W. Luo, L. Pan, Y. Xiang, O. D. Vel, and P. Montague, “Cross-Project Transfer Representation Learning for Vulnerable Function Discovery,” IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3289-3297, 2018.

• C. Chen, Y. Wang, J. Zhang, Y. Xiang, W. Zhou, and G. Min, “Statistical Features Based Real-time Detection of Drifted Twitter Spam,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 4, pp. 914-925, 2017.

• J. Zhang, X. Chen, Y. Xiang, W. Zhou, and J. Wu, “Robust Network Traffic Classification,” IEEE/ACM Transactions on Networking, vol. 23, no. 4, pp. 1257-1270, 2015.

• S. Cesare, Y. Xiang, and W. Zhou, “Control Flow-based Malware Variant Detection,” IEEE Transactions on Dependable and Secure Computing, vol. 11, no. 4, pp. 307-317, 2014.

• S. Cesare, Y. Xiang, and W. Zhou, “Malwise - An Effective and Efficient Classification System for Packed and Polymorphic Malware,” IEEE Transactions on Computers, vol. 62, no. 6, pp. 1193-1206, 2013.

• J. Zhang, Y. Xiang, Y. Wang, W. Zhou, Y. Xiang, and Y. Guan, “Network Traffic Classification Using Correlation Information,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 1, pp. 104-117, 2013.

Sponsors & Collaborators

data-driven cyber security to counterfeit malicious attacks · data-driven cyber security to...

Documents

maximization of network survivability against intelligent...

introduction to malicious code (malware) - chalmers ·...

what are malicious attacks? malicious attacks are any...

attacks using malicious hangul word processor documents

malicious management unit: why stopping cache attacks in...

types of attacks and malicious software

sparse malicious false data injection attacks and defense...

preventing and profiling malicious insider attacks ·...

discovery of malicious attacks to improve mobile...

csce 201 attacks on desktop computers: malicious code...

malicious attacks nicole hamilton, dennis meng, alex shie,...

malicious management unit: why stopping cache attacks in...

malicious attacks on ad hoc network routing...

home anti- malware protection - selabs.uk · windows pcs,...

hidenoseek: camouflaging malicious javascript in benign...

detecting malicious attacks exploiting hardware

counterfeit object-oriented programming - rub€¦ ·...

cyber-crime - black hat briefings · cyber-crime assoc...

malicious code: attacks & defenses

cybersecurity update - amazon web services...ddos attacks...