domain generation algorithm malware - cert.or.id · pdf filewhat is domain generation...

Post on 21-Mar-2018

224 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Domain Generation Algorithm Malware

Domain Generation Algorithm Malware

Enrico Hugo, CFP, CEH

ID-CERT Malware Summit II

13 April 2017 | Graha Merah Putih PT Telkom Indonesia | Bandung, Indonesia

Enrico Hugo, CFP, CEH

ID-CERT Malware Summit II

13 April 2017 | Graha Merah Putih PT Telkom Indonesia | Bandung, Indonesia

About MeEnrico Hugo, CFP, CEHBachelor of Science in Computer Science at Binus International

Ex-IT Security Intern at CBN

enrico.hugo [at] yahoo.co.id

http://www.linkedin.com/enricohugo

I have just finished my undergraduate study in Binus University International IndonesiaInternational Indonesia

Current Research Interests

CommunityIndonesia Honeynet Project - Member

DNS Analysis

Netflow Analysis

Data Mining

Machine Learning

Agenda

• Domain Name System and its threats

• Domain Generation Algorithm

• Environment Setup

• Detecting DGA

• DGA Case Study

• Possible Improvements

• Conclusion

Domain Name Systemand its threatsand its threats

Domain Name System (DNS)• Phonebook system that maps domain

names into IP addresses

• Also supports reverse lookup to search the domain name that corresponds to an IP addressaddress

• Provides caching system

• Has not been upgraded since first release, unlike the case of telnet to ssh or ftp to sftpfor security countermeasures

DNS Threats• DNS cache poisoning

• DNS tunneling

• DNS amplification attack

• Domain Generation Algorithm

• DNS Fast Flux• DNS Fast Flux

• and many more ...

DNS Threats• DNS cache poisoning

• DNS tunneling

• DNS amplification attack

• Domain Generation Algorithm

• DNS Fast Flux• DNS Fast Flux

• and many more ...

Domain Generation AlgorithmAlgorithm

What is Domain Generation Algorithm?

Domain generation algorithms(DGA) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names a large number of domain names that can be used as rendezvous points with their command and control servers.

DGA Characteristics• NXDOMAIN responses

• Usually random on the 2LD or 3LD domains

• A lot of requests from the same IP address

• Ranges from completely unreadable words (not compliant to Zipf’s Law) to dictionary (not compliant to Zipf’s Law) to dictionary words (harder to detect).

Malwares using DGA

• Kraken

• Conficker

• Gameover Zeus

• Pykspa

• Mad Max

• PandaBanker

• Pushdo

• Ramnit

• Cryptolocker

• Dyre

• Darkshell

• Locky

• Srizbi

• Torpig

• Virut

• etc.

Environment Setup

Environment Setup

Environment Setup

Detecting DGA

Detecting DGA - Zipf’s Law

Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word.

Detecting DGA - Zipf’s Law

Detecting DGA - Zipf’s Law

Detecting DGA - Zipf’s Law

Detecting DGA - Zipf’s Law

Detecting DGA - Zipf’s Law

DGA Monitor

DGA Monitor

Detecting DGA - Hierarchical Clustering

Level 1Level 1

• Query Length

• Numeric Chars

Level 2Level 2

• Unreadable Bigram Ratio

• Consonant-Vowel Ratio

Level 3Level 3• Squared Value of Numeric Chars

5

clusters

2

clusters

2

clustersLevel 3• Squared Value of Numeric Chars

Level 4Level 4

• Maximum Consonant Sequence Length

• Maximum Label Length

Level 5Level 5

• 2LD Frequency Score

• 3LD Frequency Score

clusters

2

clusters

3

clusters

UBRatio and CVRatio

Detecting DGA - Hierarchical Clustering

Level 1Level 1

• Query Length

• Numeric Chars

Level 2Level 2

• Unreadable Bigram Ratio

• Consonant-Vowel Ratio

Level 3Level 3• Squared Value of Numeric Chars

5

clusters

2

clusters

2

clustersLevel 3• Squared Value of Numeric Chars

Level 4Level 4

• Maximum Consonant Sequence Length

• Maximum Label Length

Level 5Level 5

• 2LD Frequency Score

• 3LD Frequency Score

clusters

2

clusters

3

clusters

Maximum Consonant Sequence Length (MCSLen)

• google.com -> 2 characters

• domobhdst.net -> 5 characters

Algorithmically-generated domains tend to have longer Maximum Consonant Sequence

Length (MCSLen).

Detecting DGA - Hierarchical Clustering

Level 1Level 1

• Query Length

• Numeric Chars

Level 2Level 2

• Unreadable Bigram Ratio

• Consonant-Vowel Ratio

Level 3Level 3• Squared Value of Numeric Chars

5

clusters

2

clusters

2

clustersLevel 3• Squared Value of Numeric Chars

Level 4Level 4

• Maximum Consonant Sequence Length

• Maximum Label Length

Level 5Level 5

• 2LD Frequency Score

• 3LD Frequency Score

clusters

2

clusters

3

clusters

Detecting DGA - Hierarchical Clustering

Cluster Descriptions

Clustering Results

Case Study

Case Study – The Discovery of Pykspa Malware2nd of November 2016 8th of November 2016 14th of November 2016

N times shows the number of blocked DNS request (by Palo Alto) from an IP address.

As can be seen, 210.210.150.30 is on all shown lists. Only three days of sample is

shown in this slide, but in fact the IP is on the Top 20 list everyday, which is suspicious.

Case Study - Steps of Detection

• Deploy Dionaea honeypot on same subnet

• Direct SSH access

• List running processes using ps aux

Case Study - Steps of Detection

• See resource consumption using top

Case Study – Steps of Detection

• Find the suspected file location using find

• Upload the files to VirusTotal• Upload the files to VirusTotal

– sujeljlanddrcsuj.exe => KillAV Trojan

– vmqaw.exe => Pykspa Worm

Case Study – Steps of Detection

• Pykspa is said to be spread through Skype, so I searched for Skype and found no running Skype instance, but found a Skype installer file.

• Or ...

Case Study – Proof of Detection

• Johannes Bader (https://johannesbader.ch) did a reverse engineering of the Pykspa worm and figured out its DGA algorithm, consisting of many noisy (camouflage) DGA and some useful (intended) DGA.

• Using his Python script, we get some domain names that will be used by Pykspa in the same day the script is run, as seen in the next slide.

• The script: https://johannesbader.ch/2015/03/the-dga-of-pykspa/dga.zip

Case Study – Proof of Detection

• 10 sample DGA of Pykspa for 15th of November 2016

Case Study – Proof of Detection

Conclusion

Possible Improvements

• Improve DGA Monitor by creating blacklist and whitelist

• Find a method to confirm whether a given domain name is a DGA domaindomain name is a DGA domain

Conclusion

• Blocked does not mean solved.

• Look for NXDOMAIN and SERVFAIL queries when detecting DGA

• It is necessary to be proactive, not reactive, • It is necessary to be proactive, not reactive, by consistently performing Threat Hunting

Join Us

• http://www.ihpcon.id

• Indonesia Honeynet Project

• idhoneynet• idhoneynet

• http://www.honeynet.or.id

• http://groups.google.com/group/id-honeynet

enrico.hugo [at] yahoo.co.id and +62 857 1631 5877

top related