featuresmith - usenix · • featuresmith discovered new features – getsimoperatorname –...

30
FeatureSmith Learning to Detect Malware by Mining the Security Literature

Upload: others

Post on 24-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

FeatureSmithLearningtoDetectMalwarebyMiningtheSecurityLiterature

Page 2: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

Benign

Malicious

SecurityandMachineLearning

T.Dumitraș::FeatureSmith

Usedfordetec+ngspam,phishing,malware,networka7acks,maliciousdomains,vulnerabilityexploitsinthewild,compromisedWebsites,…

Whatdoesitmeanfortwosamplestobesimilar?

2

Page 3: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

FeaturesinMachineLearningModels

•  Howshouldwecomparesamples?–  Spam: keywords,featuresfrom

emailheader,…

–  AIbots:vocabulary,sentence structure,…

T.Dumitraș::FeatureSmith

Ittakesonetoknowone?Featureengineering

3

Page 4: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

RunningExample:AndroidMalwareDetecEon

•  Howshouldwecomparesamples?–  Permissions

•  Protectsensi+vedataandfunc+onality•  Doesnotworkforprivilegeescala+on

–  APImethodcalls•  Revealmalwarebehaviors

•  Featureengineering–  Usedomainknowledgetoiden+fyusefulfeatures– MustconsiderthreatsemanEcs

T.Dumitraș::FeatureSmith 4

Page 5: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

TheSecurityBodyofKnowledge

•  Growingvolumeofpapers,industryreports,blogs,…

T.Dumitraș::FeatureSmith

Difficulttoassimilateallrelevantknowledge

5

Page 6: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

Dilemma

T.Dumitraș::FeatureSmith

VS.

Growingbodyofknowledge Needforgoodfeatures

CanweengineerfeaturesautomaEcally,byminingsecuritypapers?

6

Page 7: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

CanwecreateanarEficialintelligencethathelpsusbuild

otherintelligentsystems?

Page 8: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

SecurityThreatsinNaturalLanguage

•  “TheZsonemalwareisdesignedtosendSMSmessagestocertainpremiumnumbers”*

T.Dumitraș::FeatureSmith

*Zhouetal.‘Hey,you,getoffofmymarket:Detec+ngmaliciousappsinofficialandalterna+veandroidmarkets,’NDSS2012.

SMSfraud

8

Page 9: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

SecurityThreatsinNaturalLanguage

•  “GingerMaster[…]iso>enbundledwithbenignapplica?onsandtriestogainrootaccess”*

T.Dumitraș::FeatureSmith

Evasion,privilegeescala+on

*Arpetal.‘Drebin:Effec+veandExplainableDetec+onofAndroidMalwareinYourPocket,’NDSS2014.

9

Page 10: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

Plato’sAllegoryoftheCave

T.Dumitraș::FeatureSmith

Illustra+onbyJohnD’Alembert

10

Page 11: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

DomainKnowledge

T.Dumitraș::FeatureSmith 11

Page 12: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

Challenge#1

UnderstandingthesemanEcmeaning

–  Basedoncommonsense,knowledgeofsecuritydomain

T.Dumitraș::FeatureSmith 12

Page 13: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

Challenge#2

A7ackerbehaviorskeepevolving

–  Securityarmsrace

– Mustdiscoveropen-endedbehaviors

T.Dumitraș::FeatureSmith

IEEESecurityandPrivacySymposium

13

Page 14: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

IntuiEonforAutomaEcFeatureEngineering

T.Dumitraș::FeatureSmith

Accesssensi+vedata

Communicateovernetwork

Executeexternalcommands

getDeviceId

getSubscriberId

execH7pRequest

setWifiEnabled

Run+me.exec

Features(suspiciousAPIcalls)Malwarebehaviors*

*Arpetal.NDSS’14

14

Page 15: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

BehaviorExtracEon

•  Behavior–  Descrip+onofmalwareac+vity–  Shortphrase

•  <subject?,verb,object?>•  Parsegramma+calstructureofsentences

“TheZsonemalwareis

designedtosendSMS

messagestocertain

premiumnumbers”*

T.Dumitraș::FeatureSmith

•  ZsonemalwaresendSMSmessages

•  designedZsonemalware

•  Zsonemalwaresendtocertainpremiumnumbers

*Zhouetal.NDSS’12

15

Page 16: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

BehaviorUnderstanding

•  Linkbehaviorstoconcretefeatures

T.Dumitraș::FeatureSmith

“APIcallsforaccessingsensi?vedata,suchasgetDeviceId()”*

accessingsensi+vedata

getDeviceId()

*Arpetal.NDSS’14

16

Page 17: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

BehaviorUnderstanding

•  Linkbehaviorstomalware

T.Dumitraș::FeatureSmith

ZsonemalwareisdesignedtosendSMSmessagestopremiumnumbers

Zsone sendSMSmessages

*Zhouetal.NDSS’12

17

Page 18: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

SemanEcNetwork

•  Nodes:securityconcepts– Malwarefamilies: nameden++es–  Concretefeatures: nameden++es–  Behaviors: openended

•  Edges:seman+callyrelatedconcepts– Weightsbasedondistanceandco-occurrence

T.Dumitraș::FeatureSmith 18

Page 19: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

SemanEcNetworkExample

T.Dumitraș::FeatureSmith

Zsone

Zitmo

SendSMSmessage

Iden+fyexecu+onpath

Extractsenderphonenumber

Openmanifestfile

SEND_SMS

sendTextMessage

Thread.start

createFromPdu

openXmlResourceParser

Malware Behavior Feature

1

1

0.25

0.75

19

Page 20: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

HowWellDoesThisWork?

AutomaEcfeatureengineering•  FeatureSmith

–  Analyzed1,068securitypapers

–  AutomaEcallyengineered195featuresrelevanttoAndroidmalware•  Outof383foundinthepapers

Manualfeatureengineering•  Drebin*

–  State-of-the-artAndroidmalwaredetector

–  Uses545,334features•  Including315suspiciousAPIcalls,manuallycurated

T.Dumitraș::FeatureSmith

*Arpetal.NDSS’14

20

Page 21: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

Autovs.Manual:Experiment

AutomaEc•  Featuresengineeredby

FeatureSmith

Manual•  FeaturesusedinDrebin

T.Dumitraș::FeatureSmith

•  Sameclassifica+onalgorithm•  Samecorpusofbenignandmaliciousapps•  Samefeaturetypes•  Experiment:Comparethetwofeaturesets

21

Page 22: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

Autovs.Manual:Features

•  FeatureSmithdiscoverednewfeatures–  getSimOperatorName–  getNetworkOperatorName–  getCountry

•  Onenusedbymalware–  HelpdetectGappusinfamily (notdetectedbyDrebin)

T.Dumitraș::FeatureSmith

Missingfrommanuallyengineeredset

HumandatascienEstscannotassimilateallrelevantknowledge

22

Page 23: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

Autovs.Manual:DetecEonPerformance

0.00 0.02 0.04 0.06 0.08 0.10FDlse PRsitive 5Dte

0.80

0.85

0.90

0.95

1.00

Tru

e P

Rsi

tive 5

Dte

T.Dumitraș::FeatureSmith 23

Page 24: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

Autovs.Manual:DetecEonPerformance

0.00 0.02 0.04 0.06 0.08 0.10FDlse PRsitive 5Dte

0.80

0.85

0.90

0.95

1.00

Tru

e P

Rsi

tive 5

Dte

Drebin

T.Dumitraș::FeatureSmith 24

Page 25: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

0.00 0.02 0.04 0.06 0.08 0.10FDlse PRsitive 5Dte

0.80

0.85

0.90

0.95

1.00

Tru

e P

Rsi

tive 5

Dte

FeDture6Pith

Drebin

Autovs.Manual:DetecEonPerformance

T.Dumitraș::FeatureSmith

Paritywithmanualfeaturesat1%falseposiEves

25

Page 26: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

KnowledgeEvoluEon

T.Dumitraș::FeatureSmith

0.00 0.02 0.08 0.100.04 0.06 False Positive Rate

0.80

0.85

0.90

0.95

1.00Tr

ue P

ositi

ve R

ate

Feature sets2012 (24 features) 2013 (32 features) 2014 (40 features)2015 (46 features)

Effec+venessoffeaturesdiscoveredindifferentyears

26

Page 27: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

AlternaEves

•  Featureselec+on– Mustenumerateallpossiblefeaturesinadvance(e.g.allAndroidpermissions)

•  Representa+onlearning–  Discoversusefulfeatures(representa+ons)fromrawdata(e.g.usinganeuralnetwork)

•  Disadvantages–  Data-driven:mayreflectbiasesinthegroundtruth–  Noautoma+cdiscoveryofthreatsemanEcs

T.Dumitraș::FeatureSmith 27

Page 28: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

InANutshell•  Automa+cfeatureengineering

–  DiscoversemanEcallymeaningfulfeatures•  Somemissingfrommanuallycuratedset

–  Performanceonparwithstate-of-the-artmalwaredetector–  Manypoten+alapplica+ons

•  Security: AIbots,threatintelligence,intrusiondetec+on,…•  Otherfields: biomedicalresearch,IBM’sWatsonQ&Asystem

•  Complementshuman-drivenfeatureengineering–  Humandatascien+stshaveintuiEon–  FeatureSmithcanreasonoverenErebodyofknowledge

•  Paperanddata:h7p://featuresmith.org

T.Dumitraș::FeatureSmith 28

Page 29: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

AutomatedsystemscanunderstandthesemanEcsofsecurityconcepts

ThisisapowerfultoolforcreaEngabacksanddefenses

Page 30: FeatureSmith - USENIX · • FeatureSmith discovered new features – getSimOperatorName – getNetworkOperatorName – getCountry • Onen used by malware – Help detect Gappusin

Thankyou!

T.Dumitraș::FeatureSmith

TudorDumitraș@tudor_dumitras

http://featuresmith.org

Acknowledgments:•  WorkwithZiyunZhu•  RobotcartoonsbyKatyTresedder

30