detection of malware through code reuse and yarabox5781.temp.domains/~dfircouk/papers/detection...
TRANSCRIPT
![Page 1: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/1.jpg)
1
Detection of malware through code reuse and Yara
Adam Burt
3rd August 2018
![Page 2: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/2.jpg)
2
Abstract
This paper explores the use of Yara to detect malicious code samples in Windows
PE files, specifically by analysing code that is reused and writing detection rules for
this code. This paper also explores the current methods and their deficiencies for
malware detection using Yara; such as hash matching, string analysis and import
hashes (ImpHashes). This paper makes use of four Windows PE files, written in C,
specifically to demonstrate how existing methods can be subverted and whilst many
of the properties of a PE file can change, the functionality can remain the same.
Similarly, how many properties of a PE file can remain the same, whilst the
functionality can change.
![Page 3: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/3.jpg)
3
1.1 Introduction
Code reuse in malware is becoming increasingly common. Code reuse allows the
malware author(s) to release new variants of malware quicker than they would if
code reuse was not utilised. The code reused in malware can vary, however, tends to
be used to introduce new functionality into a malware variant.
Taking an example of this; the “notpetya” malware made use of code from the
“petya” malware, to achieve its goal. Whilst the “notpetya” malware was designed
to look like ransomware, its outcome was different than that of other ransomware
variants. Similarities between “notpetya” and “petya” were discovered in its
behaviour and also its code reuse. Specifically, the “petya” and “notpetya” used the
same code for Master File Table (MFT) encryption. The differences lie in how the
boot loader achieved this and how the boot loader was overwritten from a dropper.
“Most likely someone ripped the boot loader code straight out of Petya…”.
(MalwareTech, 2017)
Detecting malicious software is usually carried out using an Anti-Virus solution.
Anti-Virus solutions employ various techniques ranging from hash matching of files,
through to behaviour detection. These static and dynamic detections are usually used
sequentially, allowing for quicker detection with the primary static methods and
relying on dynamic methods as a secondary method.
It is a goal of malware to evade detection. To evade the static and simplistic
hash detection, malicious files are altered slightly to ensure a unique hash and
therefore a unique program, each time they are deployed. This can be through
polymorphic code which “uses a polymorphic engine to mutate while keeping the
original algorithm intact” (Wikipedia, 2017). Polymorphic Viruses can also “rely on
mutation engines to alter their decryption routines every time they infect a machine”
(Trend Micro, Unknown). In either case, the functionality of the malicious software
remains the same. It is this functionality that results in particular actions being
carried out on a system, that can be detected using behavioural engines. It is also
this functionality that can be targeted using more advanced static detection methods.
‘Yara’ is a pattern matching tool that is used to “create descriptions of malware
families (or whatever you want to describe) based on textual or binary patterns”
(Alvarez, Unknown). A rule within Yara “consists of a set of strings and a Boolean
expression which determine its logic” (Alvarez, Unknown). Diagram 1 shows a
simple example of what a Yara rule may look like.
![Page 4: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/4.jpg)
4
Diagram 1 (Alvarez, Welcome to YARA’s documentation! - yara 3.7.0
documentation, 2018)
Within the Yara functionality also contains the ability to load modules that
extend the scanning functionality of Yara. To target PE files specifically “The PE
module allows you to create more fine-grained rules for PE files by using attributes
and features of the PE file format. This module exposes most of the fields present in
a PE header and provides functions which can be used to write more expressive and
targeted rules”. (Alvarez, PE module - yara 3.7.0 documentation, 2018). Within this
module, Yara can calculate an Import Table Hash from the PE file and use it in the
Yara rule. An Import Table Hash was first discussed by FireEye / Mandiant.
Mandiant describe the Import Table Hash: “Mandiant creates a hash based on
library/API names and their specific order within the executable. We refer to this
convention as an ImpHash (for "import hash")“ (Mandiant, 2018). This methods was
first discussed in FireEye’s paper entitled “Supply Chain Analysis: From
Quartermaster to SunshopFireEye” (FireEye, 2014). The hash created by Yara is an
MD5 hash of the PE’s import table and is based on the techniques discussed by
FireEye / Mandiant (Alvarez, PE module - yara 3.7.0 documentation, 2018). Another
module commonly used is the “Hash” module. “The Hash module allows you to
calculate hashes (MD5, SHA1, SHA256) from portions of your file and create
signatures based on those hashes.” (Alvarez, Hash module - yara 3.7.0 documentation,
2018).
![Page 5: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/5.jpg)
5
2 PE samples used in examinations
The samples that were created for examination for this paper were written in C
and carry out minimal functionality. The sample names are:
• TestApp1.exe
• TestApp2.exe
• TestApp3.exe
• TestApp4.exe
Each sample carries out the following functions:
1. Creates a file in the directory the EXE is executed from called
“mytestfile.txt”.
2. Writes data (a string) into the newly created file that is “This is test
data”
3. If successful it will print “Data Written Successfully” to the console
screen
4. If unsuccessful it will print “Data could not be written” to the console
screen
5. In either case of unsuccessful or successful the program will then wait for
the user to press a key and then exit
The only addition to the above chain of events is in the sample named
“TestApp4.exe”. In this sample, steps 1-4 are carried out above, followed by:
1. Opens the first Key in the registry location
“HKEY_LOCAL_MACHINE\SOFTWARE\ and prints the data
from the “Path” value (if it exists).
The “TestApp4.exe” application then carries out step 5 above (awaiting user
input before closing).
![Page 6: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/6.jpg)
6
3 Examining the MD5 hash and strings
Each of the test applications were run through a tool called ‘FCIV’ (Microsoft,
03) created by Microsoft. The FCIV tools provides the calculated MD5 sum of each
of the files. The results were as follows:
• TestApp1.exe – e69b8e7cd74113d6c94565740dc1e8ff
• TestApp2.exe – b239be61cffd1607ea702bdf263168b0
• TestApp3.exe – 12fbca2eeb557f6b15882f5a373a52e5
• TestApp4.exe - e13c11c301c43a9af1790da13708b5b4
Each file was unique in that the raw content of each file was different. This
difference, even as little as 1 bit of data, will results in a different MD5 sum
(excluding collisions) (Selinger, 2018). To use Yara for detecting these files and any
subsequent files that are related, requires the use of each MD5. The below code
represents a Yara rule that will detect these samples:
import "hash" rule TestAppHash { meta: author = "Adam Burt" description = "Detect based on MD5 hash" date = "2018-08-03" condition: hash.md5(0,filesize) == "e69b8e7cd74113d6c94565740dc1e8ff" or hash.md5(0,filesize) == "b239be61cffd1607ea702bdf263168b0" or hash.md5(0,filesize) == "12fbca2eeb557f6b15882f5a373a52e5" or hash.md5(0,filesize) == "e13c11c301c43a9af1790da13708b5b4" } When testing this rule using the Yara64.exe binary against all of the TestApp
files we receive a positive match. Diagram 2 shows the output form running the
‘Yara64.exe’ against the test EXE files using the above rule.
Diagram 2 – Results of hash matching Yara rule against test files.
![Page 7: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/7.jpg)
7
This means that when targeting these samples (which are all related), while file
hashes are enough, the resource required to track and create detections for individual
hashes soon becomes overwhelming. Because each file can naturally or forcibly
product a different hash value, we will look closer inside the TestApp files for
commonalities. We know that each TestApp file creates a file called “mytestfile.txt”
and writes a string “This is test data” into that file. They also print the same
strings to the console based on being successful, or not. We make use of the tool
“BinText” by McAfee / Foundstone (McAfee, 2018) to examine the strings within
the file “TestApp1.exe”. Diagram 3 shows the strings that are of interest, present
in “TestApp1.exe”.
Diagram 3 – Strings present within “TestApp1.exe”
By using these strings, we can create a Yara rule based on string pattern
matching. We the following rule:
rule TestAppStrings
{
meta:
author = "Adam Burt"
description = "Detect based on MD5 hash"
date = "2018-08-03"
strings:
$a1 = "This is test data" ascii
$a2 = "mytestfile.txt" ascii
![Page 8: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/8.jpg)
8
$a3 = "Data could not be written" ascii
$a4 = "Data written successfully" ascii
condition:
all of them
}
Subsequently, we can test this rule using ‘Yara64.exe’ on all the TestApp
sample files. Diagram 4 shows the results.
Diagram 4 – Matching a Yara string rule against all TestApp samples.
Why do we not get a match on all the files? Quite simply, the strings are not
present in each sample file. Each sample writes the same string of data to the same
created filename and then displays the same string for successful, or unsuccessful.
The difference begins with the strings used in each sample. We again use the tool
“BinText” to display the strings within the “TestApp1.exe” and “TestApp2.exe”
sample files. When comparing “TestApp1.exe” and “TestApp2.exe” there are
noticeable differences in the text being used for file creation, the strings being written
to the file and also the resulting message displayed in the console. Diagram 5 shows
the differences (highlighted) between “TestApp1.exe” and “TestApp2.exe”.
Diagram 5 – String differences in “TestApp1.exe” and “TestApp2.exe”
![Page 9: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/9.jpg)
9
The strings are different because they have been encrypted, albeit a simple
XOR encryption with a single byte. This introduces two major differences in the
files:
1. A new string (or in this case a single byte) used for the XOR
encryption of the string.
2. A new routine that decrypts the strings in memory before being used.
This simple change rules out another method for detecting commonality
between these two sample files – their strings. A rule could be written in Yara to
detect the strings present in each file, such as:
rule TestAppMoreStrings
{
meta:
author = "Adam Burt"
description = "Detect based on MD5 hash"
date = "2018-08-03"
strings:
$a1 = "This is test data" ascii
$a2 = "mytestfile.txt" ascii
$a3 = "Data could not be written" ascii
$a4 = "Data written successfully" ascii
$b1 = "Ecuc!epwmf!ppv!df\"xtjvugo" ascii
$b2 = "Ecuc!yskuvfp!uvedgtugwmnz" ascii
$b3 = "n{ugtvgkmg/vyv" ascii
$b4 = "Ujju!kt\"ugtv!fbvb" ascii
condition:
all of ($a*) or all of ($b*)
}
When using the above rule against all TestApp samples, Diagram 6 shows the
results.
![Page 10: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/10.jpg)
10
Diagram 6 – Using multiple strings to detect samples
Whilst this has detected all samples, in the real-world these samples would also
have varying string content and varying XOR or encryption ciphers to handle
them. This, similar to the MD5 hash, requires a large amount of resource to track
strings in each sample and will soon become overwhelming. If this method was to
be carried out, it is just as viable to use MD5 sums for detection.
![Page 11: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/11.jpg)
11
4 Examining the import tables
So far, both hashes and string content have been ruled out as a sustainable
method for detecting families of malware. Maintaining a library of hash values and
strings is resource intensive and cannot keep up with the mutation algorithms /
engines available for varying these parameters in files. Looking at the Import Hash
Tables may provide a more accurate method for detecting malware samples. Each
of the example files carry out similar tasks and therefore should have a similar
ImpHash associated with them. By using VirusTotal, we can upload each TestApp
EXE file to calculate its ImpHash. “VirusTotal inspects items with over 70
antivirus scanners and URL/domain blacklisting services, in addition to a myriad
of tools to extract signals from the studied content.” (VirusTotal, 2018). Diagram
7 shows the results of uploading the samples to VirusTotal.
Diagram 7 – File uploads into VirusTotal
The ImpHashes are calculated for each file and are as follows:
• TestApp1.exe – 9ea0752d8b73240994b03d0c502b2bd1
• TestApp2.exe – ce4bf5ebe8bd1fabe49efb61ec8de70e
• TestApp3.exe – 2004d3668e8b2cd2f627bd16a066c9f5
• TestApp4.exe – 2004d3668e8b2cd2f627bd16a066c9f5
Each ImpHash is different apart from “TestApp3.exe” and “TestApp4.exe”.
The reasons for this are simple. In each TestApp file, there are different Import
Tables requirements to carry out the functionality. This were modified to show
that, whilst functionality of a file remains the same, the content (string and hash)
and also the Import Tables and consequently the ImpHashes can change. This, as
with MD5 hashing, can be difficult to maintain and some becomes resource
intensive and overwhelming. TestApp files one through to three were all forcibly
modified to show that the attacker, or malware author, can control the ImpHash.
![Page 12: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/12.jpg)
12
When basing malware categorisation or grouping on ImpHash, samples can be
missed by purposefully modifying the Import Tables.
The same principle applies to “TestApp3.exe” and “TestApp4.exe”. They
have the same ImpHash, however, each file carries out different functionality. The
code in “TestApp4.exe” was modified to look like the same Import Tables as
“TestApp3.exe”. The functionality difference is acheived by not using statically or
dynamically (run-time) linked DLLs and functions. Whilst this paper is not
focussed on this technique, a short description follows.
When a program calls a function that has an publicly known (or sometimes
unknown) API, it can make use of an existing library. An example of this is calling
the “RegGetValue” function that forms part of “Windows.h” within Microsoft
Windows. When a program containing this function is compiled, the Import Table
includes the “ADVAPI32.DLL” module and the “RegGetValueA” or
“RegGetValueW” function within that module. A program can be altered so that
“RegGetValueA” or “RegGetValueW” can be called directly from the
“ADVAPI32.DLL” function at execution time, by loading the library and
function directly, rather than using the provided APIs. This generally involves
calling the “LoadLibrary” function to get a handle to the “ADVAPI32.DLL”
and then consequently the calling the “GetProcAddress”. When inspecting an
EXE file that calls a function this way, the Import Table only indicates that
“KERNEL32.DLL” is required and that “LoadLibrary” and “GetProcAddress”
functions are required. This means the malware author’s EXE file can always have
the same ImpHash, but, carry out varying functions.
This technique allows for a legitimate EXE file, perhaps something such as
“notepad.exe”, can be replicated, by means of the Import Tables, but, carry out
malicious functionality. If it was the intent of the malware author to spoof the
Import Table of another program and the Import Tables were used to detect
malware samples, there would soon be many false positives detected.
![Page 13: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/13.jpg)
13
5 Examining the reused code
Within each of the TestApp programs from two through to four, there is code
reuse being utilised. “TestApp1.exe” is being ignored on the basis that malicious
code rarely has clear text to be identified. To find this code reuse, the programs
need to be reverse engineered and decompiled to examine how they function. Hex-
Rays IDA (Hex-Rays, 2018) is used to look at the code involved with each TestApp
EXE file.
We know that each TestApp from two through to four utilises a decryption
routine to decrypt various strings. In this example, it is this decryption routine
that should be targeted as part of a common routine or code reuse detection. In
“TestApp2.exe”, “TestApp3.exe” and “TestApp4.exe”, the “CreateFileA”
function is found and just prior; an encrypted string is being run through a routine.
Diagram 8 shows the routine for “TestApp2.exe”, Diagram 9 show the routine
for “TestApp3.exe” and Diagram 10 shows the routine for “TestApp4.exe”.
Diagram 8 – Potential decryption routine being called in “TestApp2.exe”
![Page 14: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/14.jpg)
14
Diagram 9 – Potential decryption routine being called in “TestApp3.exe”
![Page 15: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/15.jpg)
15
Diagram 10 – Potential decryption routine being called in “TestApp4.exe”
Taking a closer look at these routines, it does look to be a decryption routine
(albeit a very simple one). Diagram 11 shows the routine present in
“TestApp2.exe”, Diagram 12 shows the routine present in “TestApp3.exe” and
Diagram 13 shows the routine present in “TestApp4.exe”.
![Page 16: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/16.jpg)
16
Diagram 11 – The decryption routine in “TestApp2.exe”
![Page 17: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/17.jpg)
17
Diagram 12 – The decryption routine in “TestApp3.exe”
![Page 18: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/18.jpg)
18
Diagram 13 – The decryption routine in “TestApp4.exe”
Each of these routines has a very similar code base. Whilst some of the
registers used in the opcodes change, the general function is still the same.
Yara allows for the variation in code by use of regular expressions. Regular
expressions can allow for small, or even large, variations in code. In the routines
above we can take a look at some similar code (as byes code):
• “TestApp2.exe”
o 8a 08 40 84 c9 75 f9 2b c2 83 c0 02 6a 01 50 e8 d5 01 00 00 8b
• “TestApp3.exe”
o 8a 08 40 84 c9 75 f9 2b c2 83 c0 02 6a 01 50 e8 55 02 00 00 8b
• “TestApp4.exe”
o 8a 08 40 84 c9 75 f9 2b c2 83 c0 02 6a 01 50 e8 17 03 00 00 8b
The similarities here are quite obvious, expect for 3 bytes after the “50 e8” and
before the “00 00 8b ”. This variation can be excused using regular expression in
Yara, such that the regular expression becomes:
$textdecryptionroutine = { 8a 08 40 84 c9 75 f9 2b c2 83 c0 02 6a 01 50 e8 [4] 8b }
![Page 19: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/19.jpg)
19
As a quick test, the complete Yara rule would look something like:
rule TestApp2
{
meta:
author = "Adam Burt"
description = "Detects all variants of TestApp using code matching and
string searching"
date = "03-08-2017"
strings:
$textdecryptionroutine = { 8a 08 40 84 c9 75 f9 2b c2 83 c0 02 6a 01
50 e8 [4] 8b }
condition:
$textdecryptionroutine
}
When running this rule against the “TestApp2.exe”, “TestApp3.exe” and
“TestApp4.exe” we have matches. Diagram 14 shows the output.
Diagram 14 – Code matching against “TestApp2.exe”, “TestApp3.exe”
and “TestApp4.exe”
This Yara rule is effective against these three EXE files, however, the code that
it targets is not enough. If the same Yara rule is run against a local Windows
installation (with varying installed programs) there is a match against a DLL
provided as part of the Postbox Mail software. This file is as follows:
Name: mozcrt19.dll
MD5 hash: 8f9cded297d37a8b9ad691e6b08dcab2
By extended the search throughout the three sample files, the regular
expression can be built out to reduce the risk of a false positive. The following Yara
rule contains a greater regular expression for matching the code:
![Page 20: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/20.jpg)
20
rule TestApp
{
meta:
author = "Adam Burt"
description = "Detects all variants of TestApp using code matching and
string searching"
date = "19-12-2017"
strings:
$textdecryptionroutine = { 83 ec 08 53 55 8b 6c ?? ?? 56 8b c5 57 bb
01 [3] 8d 50 01 8a 08 40 84 c9 75 ?? 2b c2 83 c0 02 6a 01 50 e8 [4] 8b cd 83 c4 08 89
44 [2] 33 f6 8d 79 01 8a 11 41 84 d2 75 ?? 2b cf 74 ?? 8b fd 2b f8 8b d0 89 7c [2] eb
[0-8] 8a 04 17 2a c3 33 c9 66 83 fb 01 0f 94 c1 88 02 46 41 8b d9 8b cd 42 8d 79 01 8b
ff 8a 01 41 84 c0 75 ?? 2b cf 3b f1 72 ?? 8b 44 [2] 5f 5e 5d 5b 83 c4 08 c3}
condition:
$textdecryptionroutine
}
When this rule is run against the three files; “TestApp2.exe”,
“TestApp3.exe” and “TestApp4.exe” it matches all three. It also does not match
any other file on the test operating system. Diagram 15 shows the results.
Diagram 15 – Code matching all three sample files and reducing false
positives
![Page 21: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/21.jpg)
21
5.1 Conclusion
From the samples files generated for this research, it was demonstrated that
using hashes, string content or Import Tables can be problematic and lead to false
positives. Whilst these samples were very simple examples, their real-life partners
are not too dissimilar.
Detecting malware or groups of malware is resource intensive when using a
simple hash. It also becomes problematic when targeting strings within a file as
these are subject to change and are quite often encrypted. Using Import Table
Hashes (ImpHashes) produces false positives and can soon become redundant if
malware authors align their Import Tables to legitimate software. Outside of
behavioural detection, these static methods form a large basis for detecting
malware. By targeting routines within programs and looking for code reuse,
detection can be carried out more effectively. However, the trade-off, is in the time
spent analysing these files. It takes a great deal of skill and effort to reverse
engineer even a simple program. Locating, matching and documenting common
routines or code reuse, only adds to the resource and time required.
The more prevalent code reuse is in the malicious programs, the easier it is to
detect. Common code samples can be documented and Yara rules written for them.
In this process, at least, code reuse and therefore routine reuse becomes more
apparent. This helps identify what a program is potentially capable of, based on
known code patterns.
![Page 22: Detection of malware through code reuse and Yarabox5781.temp.domains/~dfircouk/papers/Detection of... · remains the same. It is this functionality that results in particular actions](https://reader036.vdocuments.us/reader036/viewer/2022071011/5fc93f3985083048805f36e8/html5/thumbnails/22.jpg)
22
6 Bibliography
Alvarez, V. M. (2018, 08 03). Hash module - yara 3.7.0 documentation. Retrieved
from Yara v3.7.0: http://yara.readthedocs.io/en/v3.7.0/modules/hash.html Alvarez, V. M. (2018, 08 03). PE module - yara 3.7.0 documentation. Retrieved from
Yara v3.7.0: http://yara.readthedocs.io/en/v3.7.0/modules/pe.html
Alvarez, V. M. (2018, 08 03). Welcome to YARA’s documentation! - yara 3.7.0
documentation. Retrieved from Yara v3.7.0:
http://yara.readthedocs.io/en/v3.7.0/
Alvarez, V. M. (Unknown, Unknown Unknown). YARA - the pattern matching swiss
knife for malware researchers. Retrieved from virustotal.github.io:
https://virustotal.github.io/yara/
FireEye. (2014, 08 29). SUPPLY CHAIN ANALYSIS: From Quatermaster to
SunshopFireEye. Retrieved from FireEye:
https://www.fireeye.com/content/dam/fireeye-www/global/en/current-
threats/pdfs/rpt-malware-supply-chain.pdf
Hex-Rays. (2018, 08 03). IDA: ABout. Retrieved from Hex-Rays: https://www.hex-
rays.com/products/ida/
MalwareTech. (2017, 06 27). Petya Ransomware Attack - What's Known |
MalwareTech. Retrieved from MalwareTech:
https://www.malwaretech.com/2017/06/petya-ransomware-attack-whats-
known.html
Mandiant. (2018, 08 03). Tracking Malware with Import Hashing; Tracking Malware
with Import Hashing | FireEye Inc. Retrieved from FireEye.com:
https://www.fireeye.com/blog/threat-research/2014/01/tracking-malware-
import-hashing.html
McAfee. (2018, 08 03). BinText. Retrieved from McAfee: http://b2b-
download.mcafee.com/products/tools/foundstone/bintext303.zip
Microsoft. (03, 08 2018). Availability and description of the File Checksum Integrity
Verifier utility. Retrieved from Microsoft: https://support.microsoft.com/en-
gb/help/841290/availability-and-description-of-the-file-checksum-integrity-
verifier-u
Selinger, P. (2018, 08 03). MD5 Collision Demo. Retrieved from mscs.da.ca:
https://www.mscs.dal.ca/~selinger/md5collision/
Trend Micro. (Unknown, Unknown Unknown). Polymorphic Virus - Definition -
Trend Micro USA. Retrieved from Trend Micro:
https://www.trendmicro.com/vinfo/us/security/definition/Polymorphic-
virus
VirusTotal. (2018, 08 03). Upload a sample. Retrieved from VirusTotal:
https://www.virustotal.com/#/home/upload
Wikipedia. (2017, 11 23). Polymorphic code - Wikipedia. Retrieved from Wikipedia:
https://en.wikipedia.org/wiki/Polymorphic_code