cisc 879 - machine learning for solving systems problems presented by: satyajeet dept of computer...
TRANSCRIPT
![Page 1: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/1.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Presented by: SatyajeetDept of Computer & Information Sciences
University of Delaware
Automatic Analysis of Malware Behavior using Machine LearningAuthor’s: Konrad Rieck, Philipp Trinius, Carsten Willems, and
Thosten Holz
![Page 2: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/2.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Abstract & Introduction
• Malware - • Poses major threat to security of computer systems.
• Very diverse – viruses, internet worms, trojan horses,
• Amount of malware – millions of hosts infected
• Obfuscation and polymorphism impede detection at file level
• Dynamic analysis helps characterizing and defending.
![Page 3: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/3.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Abstract & Introduction Contd..
• Framework for automatic analysis of malware behavior using Machine learning
• Framework allows automatic analysis of novel classes of malware with similar behavior – Clustering.
• Assigning unknown classes of malware to these discovered classes – Classification.
• An incremental approach based on both for behavior based analysis.
![Page 4: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/4.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Automatic analysis of Malware Behavior
• Framework steps and procedure• Executing and monitoring malware binaries in
sandbox environment. Report generated on system calls and their arguments.
• Sequential reports are embedded in a vector space where each dimension is associated with a behavioral pattern.
• ML techniques then applied to the embedded reports to identify and classify malware.
• Incremental analysis progress by alternating between clustering and classification.
![Page 5: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/5.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Report representation• Can be textual or XML
• Human readable and suitable for computation of general statistics
• But not efficient for automatic analysis
• Hence MIST (Malware Instr. Set)
• Inspired from instr. set used in process design.
![Page 6: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/6.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
MIST
• Category of system calls
• Operation - Reflects a particular system call
• Arguments as argblocks.
![Page 7: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/7.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Sandbox and MIST representation
![Page 8: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/8.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Representation
• These sequential reports identify typical behavior of malware – Changing registry keys, modifying system files.
• But still not suitable for efficient analysis techniques. Hence the need to embed behavior reports in vector space – Using instruction q-grams.
• This embedding enables expressing the similarity of behavior geometrically – Calculating distance.
![Page 9: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/9.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Clustering and Classification
• Reports are embedded in vector space – Process ready for applying ML techniques
• Clustering of behavior – where classes of similar behavior malware are identified.
• Classification of behavior – which allows to assign malware to known classes of behavior.
• What allows us to do this?
• Malware binaries are a family of similar variants with similar behavior patterns !
![Page 10: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/10.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Contd..
![Page 11: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/11.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Algorithms
• Prototype extraction
• Iterative algorithm
• Extracts small set of prototypes from set of reports. First one chosen at random.
• Clustering using Prototypes
• Prototypes at beginning are individual clusters
• Algorithm determines and merges nearest pairs of clusters
• Classification using Prototypes
• Allows to learn to discriminate between classes of malware.
![Page 12: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/12.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Algorithms Contd..
• For each report algorithm determines the nearest prototype of clusters in training data, if within radius then assigns to cluster
• Else rejects and holds back for later incremental analysis.
• Incremental analysis• Reports to be analyzed are received from source.
• Initially classified using prototypes of known clusters
• Thereby variants of known malware are identified for further analysis.
• Prototypes extracted from remaining reports and clustered again.
![Page 13: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/13.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Experiments and Results
![Page 14: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/14.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Evaluating components
• Prototype extraction
• Evaluated using Precision, Recall and Compression.
• Precision – 0.99 when corpus compressed by 2.9 % & 7%
• Clustering
• Evaluated using F-measure
• F-measure for experiments – MIST 1 = 0.93 and MIST 2 = 0.95 better than previous related work 0.881
• Classification
• F-measure for experiments – MIST 1= 0.96 and MIST 2 = 0.99
![Page 15: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/15.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Experiments and Results Contd..
![Page 16: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/16.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Experiments and Results Contd..
![Page 17: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/17.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Conclusion
• A new framework introduced which overcomes several previous deficiencies.
• The framework is learning based
• Framework can be implemented in practice
• Steps – Collection of malware, a study in sandbox environment, embed observed behavior in vector space, apply learning algorithms – clustering and classification.
• This process is efficient and learns automatically after initial setup and run.
![Page 18: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic](https://reader036.vdocuments.us/reader036/viewer/2022062323/5697bf781a28abf838c81bcd/html5/thumbnails/18.jpg)
CISC 879 - Machine Learning for Solving Systems Problems
Thank you !