structured models with neural...

20
Structured models with neural networks Tomáš Pevný February 28, 2020

Upload: others

Post on 04-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Structured models with neural networks

Tomáš Pevný

February 28, 2020

Page 2: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Motivation — identification of infected computers

examples of messages:http://lop.guardpair.com/affs?addonname=[Enter%20Product%20Name]&affid=9050&subaffid=5774&subID=undefined&clientuid=undefined&origaffid=9050&origsubaffid=5774&href=http%3A%2F%2F7http://pf.updatewp.org/?v=3.16&pcrc=308403836&LSVRDT=&ty=CHECKhttp://rules.similardeals.net/v1.0/whitelist/1052/9050x5774/7online.subsea7.net?partnerName=Lyrics

Page 3: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Motivation — representation of PE file format

Executable

CodeDataPE header

Call Graph Functions

Control Flow Graph Basic Blocks

StringsImports

Instructions

f1

f2

f3

f4

f7

f8

f5

f6

b1

b3b2

b4

b5b6

b7

b8

Page 4: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Single-instance learning

x = (x1, . . . , xd) f(x) =

{+1

−1extractfeatures

send toclassifier

Icon made by Freepik from www.flaticon.com

Page 5: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Multi-instance learning

x =

(x1,1, . . . , x1,d)(x1,1, . . . , x1,d)

...(xb,1, . . . , xb,d)

f(x) =

{+1

−1extractfeatures

send toclassifier

Page 6: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Our solution of multi-instance learning

x1 ∈ Rd

x2 ∈ Rd

x3 ∈ Rd

xb ∈ Rd

...

h(x1, θh)

h(x2, θh)

h(x3, θh)

h(xb, θh)

x̃1 ∈ Ru

x̃2 ∈ Ru

x̃3 ∈ Ru

x̃b ∈ Ru

...

g({x̃i}bi=1, θg

)x̄ ∈ Ru f (x̄, θf )

g({x̃i}bi=1

)= 1

l

∑bi=1 x̃i

h(x, θh) : Rd → Ru

h(x, θh) = max{0, xTθh}f(x̄, θf ) : Ru → R2

f(x̄, θf ) = x̄Tθf

One vector per instance

One vector per sample

Page 7: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Generalized multi-instance learning

x =

(x(1)1 , . . . , x

(1)d1

)

(x(2)1,1, . . . , x

(2)1,d2

)

(x(2)1,1, . . . , x

(2)1,d2

)...

(x(2)b,1 , . . . , x

(2)b,d2

)

(x(3)1,1, . . . , x

(3)1,d3

)

(x(3)1,1, . . . , x

(3)1,d3

)...

(x(3)c,1 , . . . , x

(3)c,d3

)

f(x) =

{+1

−1extractfeatures

send toclassifier

Page 8: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Extension of universal approximation theorem

Let S be the class of spaces which1. contains all compact subsets of Rd, d ∈ N2. is closed under finite cartesian products3. for each X ∈ S we have P(X ) ∈ S1

Then for each X ∈ S, every continuous function on X can bearbitrarily well approximated by neural networks.

1Here we assume that P(X ) is endowed with some metric.

Page 9: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Identification of infected computers

Page 10: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Detection of infected users — model

d(1)1

. . . d(1)n

d(2)1

. . . d(2)n

p(1)1

. . . p(1)n

p(2)1

. . . p(2)n

k(1)1

. . . k(1)n

v(1)1

. . . v(1)n

k(2)1

. . . k(2)n

v(2)1

. . . v(2)n

�KVq(1)1

. . . q(1)n

�KVq(2)1

. . . q(2)n

�Dd1 . . . dn

�Pp1 . . . pn

�Qq1 . . . qn

�Uu(1)1

. . . u(1)3n

u(2)1

. . . u(2)

l

u(3)1

. . . u(3)

l

u(4)1

. . . u(4)

l

u(5)1

. . . u(5)

l

u(6)1

. . . u(6)

l

�SLDs(1)1

. . . s(1)

k

�SLDs(2)1

. . . s(2)

k

�SLDs(3)1

. . . s(3)

k

�userx1 . . . xp

fy

Page 11: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Detection of infected users — results

I Training data from 5-30.10 20173.6 · 109/4.45 · 106 urls / users

I Testing data from 3.11 20172 · 108/1.5 · 106 urls / users

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

recall

precision

Convolution [1] R. Forests [2]MIL - URL MIL - 5 min

[1] eXpose: A character-level convolutional neural network with embeddings for detectingmalicious URLs, file paths and registry keys, J. Saxe and K. Berlin, 2017

[2] Learning detectors of malicious web requests for intrusion detection in network traffic, L.Machlica, K. Bartos, and M. Sofka, 2017

Page 12: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Static analysis of malware binaries — PE file format

Executable

CodeDataPE header

Call Graph Functions

Control Flow Graph Basic Blocks

StringsImports

Instructions

f1

f2

f3

f4

f7

f8

f5

f6

b1

b3b2

b4

b5b6

b7

b8

Page 13: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Static analysis of malware binaries — model of a function

push regmov reg regmov reg memdec regjne num

mov reg mempush regmov mem regcall KERNEL32!Disable...

xor reg reginc regpop regretn num

1

2

3

tokenization block representationdisassembly output

2554116045

669126

1

2

3

11251163

ngrams

x1

x2

x3

MIL

NN

Page 14: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Static analysis of malware binaries — model of a binary

x1x2x3

x4x5

xn

Block NNBlock NNBlock NN

Block NNBlock NN

Block NN

f1

f2

fm

meanmax

meanmax

meanmax

y2

Function NN

Function NN

Function NN

y1

ym

meanmax Binary NN

map groupby reduce map groupby reduce

Page 15: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Static analysis of malware binaries — results

I Training data4 · 105 binaries

I Testing data5.5 · 105 binaries

I approx. 10% were malicious

Page 16: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Static analysis of malware binaries — feedback

I avoids specifying imports in PEheader

I loads addresses of libraryfunctions to an array

Page 17: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Static analysis of malware binaries — feedback

I adware related behaviorI creating both visible and

invisible windows

Page 18: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Future directions

I Learning representation of hierarchical dataI Learning distances for clusteringI Generative modelsI Anomaly / few shot learning

I Game theory in securityI Finding new attacks under constraints

I Decentralized learningI Modeling and reasoning over relational data

Page 19: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Modeling the internet

https://www.threatcrowd.org/domain.php?domain=ibuyitttttttttttttttttttttttttttttttttttibuyit.com

Page 20: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017

Materials

I https://github.com/pevnak/Mill.jlI https://github.com/pevnak/JsonGrinder.jl

I Discriminative models for multi-instance problems withtree-structure, Tomáš Pevný, Petr Somol, 2016

I Using Neural Network Formalism to Solve Multiple-InstanceProblems, Tomáš Pevný, Petr Somol, 2016

I Approximation capability of neural networks on sets ofprobability measures and tree-structured data, Tomáš Pevný,Vojtěch Kovařík, 2019

I Nested Multiple Instance Learning in Modelling of HTTPnetwork traffic, Tomas Pevny, Marek Dedic, 2020