, sujuan - ndss symposium

35
Zhen Li 1 , Deqing Zou 1 , Shouhuai Xu 2 , Xinyu Ou 1 , Hai Jin 1 , Sujuan Wang 1 , Zhijun Deng 1 , Yuyi Zhong 1 1 Huazhong University of Science and Technology ( HUST ), Wuhan, China 2 University of Texas at San Antonio ( UTSA ), San Antonio, USA

Upload: others

Post on 21-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: , Sujuan - NDSS Symposium

ZhenLi1,DeqingZou1,ShouhuaiXu2,XinyuOu1,HaiJin1,SujuanWang1,ZhijunDeng1,YuyiZhong11HuazhongUniversityofScienceandTechnology(HUST),Wuhan,China2UniversityofTexasatSanAntonio(UTSA),SanAntonio,USA

Page 2: , Sujuan - NDSS Symposium

Automatic Software Vulnerability Detection

² Automaticdetectionofsoftwarevulnerabilitiesisanimportantresearchproblem

² Staticvulnerabilitydetectiontoolsandstudies

2

RATS

VUDDY (SP’17)

ReDeBug … VulDeePecker (ACSAC’16)

Page 3: , Sujuan - NDSS Symposium

Drawbacks of Existing Approaches

² First,imposingintenselaborofhumanexperts

ü Definefeatures

² Second,incurringhighfalsenegativerates

ü Twomostrecentvulnerabilitydetectionsystems

•  VUDDY(SP’17):falsenegativerate=18.2%forApacheHTTPD2.4.23

•  VulPecker(ACSAC’16):falsenegativerate=38%withrespectto455vulnerabilitysamples

3

Page 4: , Sujuan - NDSS Symposium

Research Problem

² Giventhesourcecodeofatargetprogram,howcanwedeterminewhetherornotthetargetprogramisvulnerableandifso,wherearethevulnerabilities?

4

Withoutaskinghumanexpertstomanuallydefinefeatures

Withoutincurringahighfalsenegativerateorfalsepositiverate

Page 5: , Sujuan - NDSS Symposium

Our Main Contribution

VulnerabilityDeepPecker(VulDeePecker):

Adeeplearning-basedsystemforautomatically

detectingvulnerabilitiesinprograms(sourcecode)

5

Page 6: , Sujuan - NDSS Symposium

Outline

² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults

² Limitations

² Conclusion

6

Page 7: , Sujuan - NDSS Symposium

Outline

² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults

² Limitations

² Conclusion

7

Page 8: , Sujuan - NDSS Symposium

Guiding Principles: three questions

8

Q1:Howtorepresentsoftwareprogramsfordeeplearning-basedvulnerabilitydetection?

Q2:Whatistheappropriategranularityfordeeplearning-basedvulnerabilitydetection?

Q3:Howtoselectaspecificneuralnetworkforvulnerabilitydetection?

Page 9: , Sujuan - NDSS Symposium

Guiding Principles

9

Q1:Howtorepresentsoftwareprogramsfordeeplearning-basedvulnerabilitydetection?

Preserve the semantic relationships between the programs’elements(e.g.,data-flowandcontrol-flowinformation).

Page 10: , Sujuan - NDSS Symposium

Guiding Principles

10

Q2:Whatistheappropriategranularityfordeeplearning-basedvulnerabilitydetection?

Represented at a finer granularity than treating a program or afunctionasaunit.

Page 11: , Sujuan - NDSS Symposium

Guiding Principles

11

Q3:Howtoselectaspecificneuralnetworkforvulnerabilitydetection?

Neural networks that can copewith contextsmaybe suitable forvulnerabilitydetection.

CNN

DBN

DNN

Traditional RNN

LSTM

GRU …

RNN Unidirectional LSTM

Bidirectional LSTM

LSTM

RNN Thispaper

Page 12: , Sujuan - NDSS Symposium

Outline

² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults

² Limitations

² Conclusion

12

Page 13: , Sujuan - NDSS Symposium

Overview of VulDeePecker

13

Page 14: , Sujuan - NDSS Symposium

14

The Concept of Code Gadget

² Aunitforvulnerabilitydetection

² Anumberofprogramstatementsthataresemanticallyrelatedtoeachotherintermsofdatadependencyorcontroldependency

² Example:vulnerabilitiesrelatedtolibrary/APIfunctioncalls

Page 15: , Sujuan - NDSS Symposium

Step I: Generating Code Gadgets

15

Acodegadgetcorrespondingto

strcpy()

Page 16: , Sujuan - NDSS Symposium

² Eachcodegadgetislabeledas“1”(i.e.,vulnerable)or“0”(i.e.,notvulnerable).

16

According to the diff files

According to the vulnerable statements

Step II: Generating Ground Truth Labels

Page 17: , Sujuan - NDSS Symposium

Step III: Transforming Code Gadgets into Vectors

² Transformcodegadgetsintotheirsymbolicrepresentations² Encodethesymbolicrepresentationsintovectors

17

7 tokens

Page 18: , Sujuan - NDSS Symposium

Step IV: Training the BLSTM Neural Network

² TrainingprocessforlearningtheBLSTMneuralnetworkisstandard

18

Page 19: , Sujuan - NDSS Symposium

Steps V-VII: Detection Phase

19

Page 20: , Sujuan - NDSS Symposium

Outline

² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults

² Limitations

² Conclusion

20

Page 21: , Sujuan - NDSS Symposium

Research Questions

21

RQ1:CanVulDeePeckerdealwithmultipletypesofvulnerabilitiesatthesametime?

RQ2:Canhumanintelligence(otherthandefiningfeatures)improvetheeffectivenessofVulDeePecker?

RQ3:HoweffectiveisVulDeePeckerwhencomparedwithotherapproaches?

² Metricsforevaluationü Falsepositiverate(FPR),falsenegativerate(FNR),recall,precision,F-measure

Page 22: , Sujuan - NDSS Symposium

Preparing Input to VulDeePecker

² ProgramscollectionforansweringtheRQsü Twosourcesofvulnerabilitydata

•  19C/C++opensourceproductswhichvulnerabilitiesaredescribedinNVD,andC/C++testcasesinSARD

ü Collect520opensourcesoftwareprogramfilesand8,122testcasesforthebuffererrorvulnerability(i.e.,CWE-119),and320opensourcesoftwareprogramfilesand1,729testcasesfortheresourcemanagementerrorvulnerability(i.e.,CWE-399)

² Trainingprogramsvs.targetprograms ü Randomlychoose80%oftheprogramswecollectastrainingprogramsandtherest20%astargetprograms

22

Page 23: , Sujuan - NDSS Symposium

Learning BLSTM Neural Networks

² DatasetsforansweringtheRQsü CodeGadgetDatabase(CGD):61,638codegadgetsü SixdatasetsofCGD

23

BE:BuffererrorvulnerabilitiesRM:ResourcemanagementvulnerabilitiesHY:Hybridoftheabovetwotypesof

vulnerabilities

ALL:Alllibrary/APIfunctioncallsSEL:Manuallyselectedlibrary/

APIfunctioncalls

Page 24: , Sujuan - NDSS Symposium

RQ1

² Insight:VulDeePeckercandetectmultipletypesofvulnerabilities,buttheeffectivenessissensitivetotheamountofdata(whichiscommontodeeplearning).

24

RM:16functioncallsrelatedtovulnerabilitiesBE:124functioncallsrelatedtovulnerabilities

RQ1:CanVulDeePeckerdealwithmultipletypesofvulnerabilitiesatthesametime?

Page 25: , Sujuan - NDSS Symposium

²  Insight:HumanexpertisecanbeusedtoselectfunctioncallstoimprovetheeffectivenessofVulDeePecker.

25

RQ2:Canhumanintelligence(otherthandefiningfeatures)improvetheeffectivenessofVulDeePecker?

RQ2

Page 26: , Sujuan - NDSS Symposium

² Insight:Adeeplearning-basedvulnerabilitydetectionsystemcanbemoreeffectivebytakingadvantageofthedata-flowinformation.

26

RQ3: VulDeePecker vs. Static Analysis Tools

RQ3:HoweffectiveisVulDeePeckerwhencomparedwithotherapproaches?

Page 27: , Sujuan - NDSS Symposium

² Insight:VulDeePeckerismoreeffectivethancodesimilarity-basedapproaches

27

RQ3: VulDeePecker vs. Code Similarity-Based Approaches

RQ3:HoweffectiveisVulDeePeckerwhencomparedwithotherapproaches?

Page 28: , Sujuan - NDSS Symposium

² VulDeePeckerdetected4vulnerabilities,whichwerenotreportedintheNVD,butwere“silently”patchedbythevendors.

² Thesevulnerabilitiesaremissedbymostoftheothervulnerabilitydetectionsystemsmentionedabove

28

Using VulDeePecker in Practice

Page 29: , Sujuan - NDSS Symposium

Outline

² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults

² Limitations

² Conclusion

29

Page 30: , Sujuan - NDSS Symposium

Limitations and Open Problems

² Presentdesignü Assumingsourcecodeisavailableü OnlydealingwithC/C++programsü Onlydealingwithvulnerabilitiesrelatedtolibrary/APIfunctioncallsü Onlyaccommodatingdata-flowinformation,butnotcontrol-flowinformationü Usingsomeheuristics

² Presentimplementationü LimittotheBLSTMneuralnetwork

² Presentevaluationü Thedatasetonlycontainsvulnerabilitiesaboutbuffererrorsandresourcemanagementerrors

30

Page 31: , Sujuan - NDSS Symposium

Outline

² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults

² Limitations

² Conclusion

31

Page 32: , Sujuan - NDSS Symposium

Conclusion

² Weinitiatethestudyofusingdeeplearningforvulnerabilitydetection,anddiscusssomepreliminaryguidingprinciples

² WepresentVulDeePecker,andevaluateitfrom3perspectives

² Wepresentthefirstdatasetforevaluatingdeeplearning-basedvulnerabilitydetectionsystems² https://github.com/CGCL-codes/VulDeePecker

32

Page 33: , Sujuan - NDSS Symposium

New Results (after finishing the paper; in submission)

² Copewithallkindsofvulnerabilities(includinglibrary/APIfunctioncallsrelatedones)

² Accommodatebothdatadependencyandcontroldependency

² Detect7(potential)0-dayvulnerabilitiesand8silentlypatchedvulnerabilitiesfrom4softwareproducts

² Somedeepneuralnetworksaremorepowerfulthanothers 33

Page 34: , Sujuan - NDSS Symposium

Takeaways

² Thefirstdeeplearning-basedvulnerabilitydetectionsystemusingafiner-granularityunitcodegadget

² Guidingprinciplesfordeeplearning-basedvulnerabilitydetection

² Thefirstdatasetforevaluatingdeeplearning-basedvulnerabilitydetectionsystems

34

Page 35: , Sujuan - NDSS Symposium

[email protected]

Data available at: https://github.com/CGCL-codes/VulDeePecker

Thanks!