featuresmith - usenix · • featuresmith discovered new features – getsimoperatorname –...
TRANSCRIPT
FeatureSmithLearningtoDetectMalwarebyMiningtheSecurityLiterature
Benign
Malicious
SecurityandMachineLearning
T.Dumitraș::FeatureSmith
Usedfordetec+ngspam,phishing,malware,networka7acks,maliciousdomains,vulnerabilityexploitsinthewild,compromisedWebsites,…
Whatdoesitmeanfortwosamplestobesimilar?
2
FeaturesinMachineLearningModels
• Howshouldwecomparesamples?– Spam: keywords,featuresfrom
emailheader,…
– AIbots:vocabulary,sentence structure,…
T.Dumitraș::FeatureSmith
Ittakesonetoknowone?Featureengineering
3
RunningExample:AndroidMalwareDetecEon
• Howshouldwecomparesamples?– Permissions
• Protectsensi+vedataandfunc+onality• Doesnotworkforprivilegeescala+on
– APImethodcalls• Revealmalwarebehaviors
• Featureengineering– Usedomainknowledgetoiden+fyusefulfeatures– MustconsiderthreatsemanEcs
T.Dumitraș::FeatureSmith 4
TheSecurityBodyofKnowledge
• Growingvolumeofpapers,industryreports,blogs,…
T.Dumitraș::FeatureSmith
Difficulttoassimilateallrelevantknowledge
5
Dilemma
T.Dumitraș::FeatureSmith
VS.
Growingbodyofknowledge Needforgoodfeatures
CanweengineerfeaturesautomaEcally,byminingsecuritypapers?
6
CanwecreateanarEficialintelligencethathelpsusbuild
otherintelligentsystems?
SecurityThreatsinNaturalLanguage
• “TheZsonemalwareisdesignedtosendSMSmessagestocertainpremiumnumbers”*
T.Dumitraș::FeatureSmith
*Zhouetal.‘Hey,you,getoffofmymarket:Detec+ngmaliciousappsinofficialandalterna+veandroidmarkets,’NDSS2012.
SMSfraud
8
SecurityThreatsinNaturalLanguage
• “GingerMaster[…]iso>enbundledwithbenignapplica?onsandtriestogainrootaccess”*
T.Dumitraș::FeatureSmith
Evasion,privilegeescala+on
*Arpetal.‘Drebin:Effec+veandExplainableDetec+onofAndroidMalwareinYourPocket,’NDSS2014.
9
Plato’sAllegoryoftheCave
T.Dumitraș::FeatureSmith
Illustra+onbyJohnD’Alembert
10
DomainKnowledge
T.Dumitraș::FeatureSmith 11
Challenge#1
UnderstandingthesemanEcmeaning
– Basedoncommonsense,knowledgeofsecuritydomain
T.Dumitraș::FeatureSmith 12
Challenge#2
A7ackerbehaviorskeepevolving
– Securityarmsrace
– Mustdiscoveropen-endedbehaviors
T.Dumitraș::FeatureSmith
IEEESecurityandPrivacySymposium
13
IntuiEonforAutomaEcFeatureEngineering
T.Dumitraș::FeatureSmith
Accesssensi+vedata
Communicateovernetwork
Executeexternalcommands
getDeviceId
getSubscriberId
execH7pRequest
setWifiEnabled
Run+me.exec
Features(suspiciousAPIcalls)Malwarebehaviors*
*Arpetal.NDSS’14
14
BehaviorExtracEon
• Behavior– Descrip+onofmalwareac+vity– Shortphrase
• <subject?,verb,object?>• Parsegramma+calstructureofsentences
“TheZsonemalwareis
designedtosendSMS
messagestocertain
premiumnumbers”*
T.Dumitraș::FeatureSmith
• ZsonemalwaresendSMSmessages
• designedZsonemalware
• Zsonemalwaresendtocertainpremiumnumbers
*Zhouetal.NDSS’12
15
BehaviorUnderstanding
• Linkbehaviorstoconcretefeatures
T.Dumitraș::FeatureSmith
“APIcallsforaccessingsensi?vedata,suchasgetDeviceId()”*
accessingsensi+vedata
getDeviceId()
*Arpetal.NDSS’14
16
BehaviorUnderstanding
• Linkbehaviorstomalware
T.Dumitraș::FeatureSmith
ZsonemalwareisdesignedtosendSMSmessagestopremiumnumbers
Zsone sendSMSmessages
*Zhouetal.NDSS’12
17
SemanEcNetwork
• Nodes:securityconcepts– Malwarefamilies: nameden++es– Concretefeatures: nameden++es– Behaviors: openended
• Edges:seman+callyrelatedconcepts– Weightsbasedondistanceandco-occurrence
T.Dumitraș::FeatureSmith 18
SemanEcNetworkExample
T.Dumitraș::FeatureSmith
Zsone
Zitmo
SendSMSmessage
Iden+fyexecu+onpath
Extractsenderphonenumber
Openmanifestfile
SEND_SMS
sendTextMessage
Thread.start
createFromPdu
openXmlResourceParser
Malware Behavior Feature
1
1
0.25
0.75
19
HowWellDoesThisWork?
AutomaEcfeatureengineering• FeatureSmith
– Analyzed1,068securitypapers
– AutomaEcallyengineered195featuresrelevanttoAndroidmalware• Outof383foundinthepapers
Manualfeatureengineering• Drebin*
– State-of-the-artAndroidmalwaredetector
– Uses545,334features• Including315suspiciousAPIcalls,manuallycurated
T.Dumitraș::FeatureSmith
*Arpetal.NDSS’14
20
Autovs.Manual:Experiment
AutomaEc• Featuresengineeredby
FeatureSmith
Manual• FeaturesusedinDrebin
T.Dumitraș::FeatureSmith
• Sameclassifica+onalgorithm• Samecorpusofbenignandmaliciousapps• Samefeaturetypes• Experiment:Comparethetwofeaturesets
21
Autovs.Manual:Features
• FeatureSmithdiscoverednewfeatures– getSimOperatorName– getNetworkOperatorName– getCountry
• Onenusedbymalware– HelpdetectGappusinfamily (notdetectedbyDrebin)
T.Dumitraș::FeatureSmith
Missingfrommanuallyengineeredset
HumandatascienEstscannotassimilateallrelevantknowledge
22
Autovs.Manual:DetecEonPerformance
0.00 0.02 0.04 0.06 0.08 0.10FDlse PRsitive 5Dte
0.80
0.85
0.90
0.95
1.00
Tru
e P
Rsi
tive 5
Dte
T.Dumitraș::FeatureSmith 23
Autovs.Manual:DetecEonPerformance
0.00 0.02 0.04 0.06 0.08 0.10FDlse PRsitive 5Dte
0.80
0.85
0.90
0.95
1.00
Tru
e P
Rsi
tive 5
Dte
Drebin
T.Dumitraș::FeatureSmith 24
0.00 0.02 0.04 0.06 0.08 0.10FDlse PRsitive 5Dte
0.80
0.85
0.90
0.95
1.00
Tru
e P
Rsi
tive 5
Dte
FeDture6Pith
Drebin
Autovs.Manual:DetecEonPerformance
T.Dumitraș::FeatureSmith
Paritywithmanualfeaturesat1%falseposiEves
25
KnowledgeEvoluEon
T.Dumitraș::FeatureSmith
0.00 0.02 0.08 0.100.04 0.06 False Positive Rate
0.80
0.85
0.90
0.95
1.00Tr
ue P
ositi
ve R
ate
Feature sets2012 (24 features) 2013 (32 features) 2014 (40 features)2015 (46 features)
Effec+venessoffeaturesdiscoveredindifferentyears
26
AlternaEves
• Featureselec+on– Mustenumerateallpossiblefeaturesinadvance(e.g.allAndroidpermissions)
• Representa+onlearning– Discoversusefulfeatures(representa+ons)fromrawdata(e.g.usinganeuralnetwork)
• Disadvantages– Data-driven:mayreflectbiasesinthegroundtruth– Noautoma+cdiscoveryofthreatsemanEcs
T.Dumitraș::FeatureSmith 27
InANutshell• Automa+cfeatureengineering
– DiscoversemanEcallymeaningfulfeatures• Somemissingfrommanuallycuratedset
– Performanceonparwithstate-of-the-artmalwaredetector– Manypoten+alapplica+ons
• Security: AIbots,threatintelligence,intrusiondetec+on,…• Otherfields: biomedicalresearch,IBM’sWatsonQ&Asystem
• Complementshuman-drivenfeatureengineering– Humandatascien+stshaveintuiEon– FeatureSmithcanreasonoverenErebodyofknowledge
• Paperanddata:h7p://featuresmith.org
T.Dumitraș::FeatureSmith 28
AutomatedsystemscanunderstandthesemanEcsofsecurityconcepts
ThisisapowerfultoolforcreaEngabacksanddefenses
Thankyou!
T.Dumitraș::FeatureSmith
TudorDumitraș@tudor_dumitras
http://featuresmith.org
Acknowledgments:• WorkwithZiyunZhu• RobotcartoonsbyKatyTresedder
30