all-in-one framework for detection, unpacking, and...

17
Research Article All-in-One Framework for Detection, Unpacking, and Verification for Malware Analysis Mi-Jung Choi , Jiwon Bang , Jongwook Kim, Hajin Kim, and Yang-Sae Moon Department of Computer Science, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon-si, Gangwon 24341, Republic of Korea Correspondence should be addressed to Yang-Sae Moon; [email protected] Received 10 April 2019; Revised 21 August 2019; Accepted 5 September 2019; Published 13 October 2019 Academic Editor: Jes´ us D´ ıaz-Verdejo Copyright © 2019 Mi-Jung Choi et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Packing is the most common analysis avoidance technique for hiding malware. Also, packing can make it harder for the security researcher to identify the behaviour of malware and increase the analysis time. In order to analyze the packed malware, we need to perform unpacking first to release the packing. In this paper, we focus on unpacking and its related technologies to analyze the packed malware. rough extensive analysis on previous unpacking studies, we pay attention to four important drawbacks: no phase integration, no detection combination, no real-restoration, and no unpacking verification. To resolve these four drawbacks, in this paper, we present an all-in-one structure of the unpacking system that performs packing detection, unpacking (i.e., res- toration), and verification phases in an integrated framework. For this, we first greatly increase the packing detection accuracy in the detection phase by combining four existing and new packing detection techniques. We then improve the unpacking phase by using the state-of-the-art static and dynamic unpacking techniques. We also present a verification algorithm evaluating the accuracy of unpacking results. Experimental results show that the proposed all-in-one unpacking system performs all of the three phases well in an integrated framework. In particular, the proposed hybrid detection method is superior to the existing methods, and the system performs unpacking very well up to 100% of restoration accuracy for most of the files except for a few packers. 1. Introduction Recently, as the Internet usage has explosively increased, the risk of malware exposure is also rapidly increasing. According to the 2017 AV-Test security report [1], about six billion malwares are used annually in DDoS (distributed denial of service), spam mails, and APT (advanced persistent threat). In addition, due to the advent of new malwares exploiting analysis avoidance techniques, there have been many research efforts on personal information protection, malicious code detection, and malware analysis technology [2–10]. Among the analysis avoidance techniques, packing is the most common one used to hide malware. Packing, also known as “executable compression,” is a technique for compressing an executable file to reduce the file size while preserving its format. Packing is originally developed to reduce storage space, but malicious users exploit it to hide malware in the executable file [11, 12]. According to the WildList’s 2006 report, more than 92% of malwares are running compression technology [13]. Since packing mostly transforms the original code, we need to perform unpacking first before analyzing the packed files which may include malware. In this paper, we focus on such unpacking tech- niques used for malware detection and analysis. In order to unpack the packed malware, we need a phase of detecting whether or not the file is packed. If we conclude the file is packed, we restore (i.e., unpack) the file and sometimes verify the unpacked file. However, the existing work has separately developed these three phases of packing detection, unpacking, and verification, and thus, the analyst has difficulty in using all these three phases in an integrated manner. Moreover, there are many detection methods [14–19], but there has been no attempt to combine these detection methods. We also note that the previous unpacking research focuses on finding OEP (original entry point), the first command address where the actual program Hindawi Security and Communication Networks Volume 2019, Article ID 5278137, 16 pages https://doi.org/10.1155/2019/5278137

Upload: others

Post on 11-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

Research ArticleAll-in-One Framework for Detection Unpacking andVerification for Malware Analysis

Mi-Jung Choi Jiwon Bang Jongwook Kim Hajin Kim and Yang-Sae Moon

Department of Computer Science Kangwon National University 1 Kangwondaehak-gil Chuncheon-siGangwon 24341 Republic of Korea

Correspondence should be addressed to Yang-Sae Moon ysmoonkangwonackr

Received 10 April 2019 Revised 21 August 2019 Accepted 5 September 2019 Published 13 October 2019

Academic Editor Jesus Dıaz-Verdejo

Copyright copy 2019Mi-Jung Choi et alis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Packing is the most common analysis avoidance technique for hiding malware Also packing can make it harder for the securityresearcher to identify the behaviour of malware and increase the analysis time In order to analyze the packed malware we need toperform unpacking rst to release the packing In this paper we focus on unpacking and its related technologies to analyze thepacked malware rough extensive analysis on previous unpacking studies we pay attention to four important drawbacks nophase integration no detection combination no real-restoration and no unpacking verishycation To resolve these four drawbacks inthis paper we present an all-in-one structure of the unpacking system that performs packing detection unpacking (ie res-toration) and verication phases in an integrated framework For this we rst greatly increase the packing detection accuracy inthe detection phase by combining four existing and new packing detection techniques We then improve the unpacking phase byusing the state-of-the-art static and dynamic unpacking techniques We also present a verication algorithm evaluating theaccuracy of unpacking results Experimental results show that the proposed all-in-one unpacking system performs all of the threephases well in an integrated framework In particular the proposed hybrid detection method is superior to the existing methodsand the system performs unpacking very well up to 100 of restoration accuracy for most of the les except for a few packers

1 Introduction

Recently as the Internet usage has explosively increased therisk of malware exposure is also rapidly increasingAccording to the 2017 AV-Test security report [1] about sixbillion malwares are used annually in DDoS (distributeddenial of service) spammails and APT (advanced persistentthreat) In addition due to the advent of new malwaresexploiting analysis avoidance techniques there have beenmany research eorts on personal information protectionmalicious code detection and malware analysis technology[2ndash10] Among the analysis avoidance techniques packing isthe most common one used to hide malware Packing alsoknown as ldquoexecutable compressionrdquo is a technique forcompressing an executable le to reduce the le size whilepreserving its format Packing is originally developed toreduce storage space but malicious users exploit it to hidemalware in the executable le [11 12] According to the

WildListrsquos 2006 report more than 92 of malwares arerunning compression technology [13] Since packing mostlytransforms the original code we need to perform unpackingrst before analyzing the packed les which may includemalware In this paper we focus on such unpacking tech-niques used for malware detection and analysis

In order to unpack the packed malware we need a phaseof detecting whether or not the le is packed If we concludethe le is packed we restore (ie unpack) the le andsometimes verify the unpacked le However the existingwork has separately developed these three phases of packingdetection unpacking and verication and thus the analysthas dibrvbarculty in using all these three phases in an integratedmanner Moreover there are many detection methods[14ndash19] but there has been no attempt to combine thesedetection methods We also note that the previousunpacking research focuses on nding OEP (original entrypoint) the rst command address where the actual program

HindawiSecurity and Communication NetworksVolume 2019 Article ID 5278137 16 pageshttpsdoiorg10115520195278137

starts but does not address actual restoration of packed filesIn addition even if unpacking is successful there is noverification for the unpacked files to evaluate theirunpacking accuracy or reliability Based on a thoroughsurvey on recent studies and products we pay attention tofour important observations (1) no integration of necessaryphases (2) no combination of detection techniques (3) noreal-restoration of packed malware and (4) no verificationof unpacked images We briefly call these observations nophase integration no detection combination no real-resto-ration and no unpacking verification respectively

e goal of this paper is to propose an all-in-oneunpacking system solving or improving four observations asshown in Table 1 We explain each observation and itssolution in detail as follows First no phase integration is aproblem in which analysts have to perform each phaseseparately since the packing detection unpacking andverification phases are separately developed To resolve thisproblem we present an all-in-one unpacking system thatintegrates all these three phases of packing detectionunpacking and verification As shown in Table 2 eachexisting method focuses on a particular phase at is manystudies have been done in depth focusing on one of the threephases covered in this paper In many real applicationshowever we often need to apply all three phases at once orsequentially rather than just one phase To satisfy this de-mand the proposed all-in-one system supports all threephases together Since the system supports all necessaryphases in an integrated framework we can easily analyze themalware and obtain the objective restoration rate throughthe actual unpacking and verification phases

e second observation no detection combination isthat there is no attempt to combine various packingdetection methods Based on this observation we proposea hybrid approach to improve the packing detectionaccuracy by combining four existing and new packingdetection methods e third observation no real-res-toration is that the previous work tries to find only OEPwhen it detects packing but there is little discussionabout restoring the actual executable file We improvethis problem by actually restoring the image of theunpacked file e fourth observation no unpackingverification is that there is no previous work to measurethe restoration accuracy of unpacked images We resolvethis problem by presenting a verification algorithm toquantitatively evaluate the restoration accuracy ofunpacked file images

In this paper we implement the all-in-one unpackingsystem to reflect the solutions of Table 1 and empiricallyevaluate the proposed system First we verify each phase ofthe all-in-one system to confirm that the overall workingmechanism works well Next we construct a dataset com-posed of 2 600 PE (portable executable) files and use the setin evaluation of detection unpacking and verificationphases of the all-in-one system In the detection phase weneed to determine the entropy range first and through apreliminary experiment we set it to be less than 600 orgreater than 685 Experimental results of comparing theproposed hybrid method with individual or combined

detection stage(s) show that the proposed method shows thehighest detection accuracy up to 984 without any falsepositives We also try to actually unpack all files of thedataset to verify the unpacking phase and we confirm thatall files including Yodarsquos Protector [22] are 100 unpackedFinally through the evaluation of the verification phase wesee that most files except those packed by some packers showup to 100 restoration accuracy

e contribution of the paper is as follows First this isthe first attempt to integrate detection unpacking andverification phases into a single unified framework eproposed all-in-one concept which integrates all threephases rather than focusing on only one phase allows usersto more easily detect and analyze malware Second based onempirical experience we present a hybrid approach ofpacking detection that exploits four existing and new de-tection techniques ird we perform actual unpackingrestoration beyond detecting OEPs only Fourth we proposea verification algorithm that measures the accuracy ofunpacking results Fifth through various experiments wedemonstrate the superiority of the proposed all-in-oneunpacking system

e rest of the paper is organized as follows Section 2describes related work on packing detection and unpackingSection 3 presents an overall architecture of the proposed all-in-one unpacking system Section 4 explains detectionunpacking and verification phases in detail to show how theall-in-one system works Section 5 presents experimentalresults We finally summarize and conclude the paper inSection 6

2 Related Work

21 Packing Detection Techniques It takes much time todetect and analyze the malware to which packing is appliedand thus there have beenmany studies on packing detectionand unpacking techniques Choi et al [15] propose PHADthat detects packed files by analyzing the header informationof PE files PHAD selects eight characteristic variables todistinguish between the general file and the packed filethrough heuristic analysis of the PE header and based onthese variables it determines whether a file is packed or notMore specifically it calculates the Euclidean distance of eightvariables selected by the characteristic vector (CV) andconfirms it to be packed if that distance exceeds the heu-ristically determined minimum threshold PHAD shows thehigher detection accuracy with lower false-negative rates ascompared to commonly used software PEiD [23] but has adrawback that many false positives occur

Lyda and Hamrock [17] use the entropy-based analysisto detect encrypted or packed malware Entropy is a measureof the uncertainty of information and packed files tend tohave higher entropy than regular files because they compressthe original executable sections or collapse those sectionsinto a few new sections We calculate the entropy as shownin equation (1) where p(i) is the probability of the i-th unitof information (such as a number) in event xrsquos series of nsymbols is equation generates entropy scores as realnumbers [17]

2 Security and Communication Networks

H(x) minus 1113944

n

i1p(i)log2p(i) (1)

e entropy-based method has an advantage of beingeasily applied to various packers However it causes a falsepositive if the entropy of a normal file is low or a falsenegative if that of a packed file is high

Han and Lee [16] propose REMINDer that detects thepacking based on the entropy value of the entry point sectionand the WRITE attribute ey note that a large number offalse positives occur because the entropy calculation range istoo wide and thus REMINDer uses the EP (entry point)section only in computing the entropy value In addition tothe entropy-based detection method of the EP section theyalso use the WRITE attribute test which is an essentialfeature of the packed file to reduce false positives is isbecause unlike ordinary files packed PE files requireWRITE permission to perform unpacking before the file isexecuted

Arora proposed by Rohit et al [19] is a packing detectiontechnique for analyzing the PE information in a packed fileusing a heuristic algorithm It defines various parameters forpacking analysis and presents a heuristic algorithm forassigning weights and risk factors to those parameters Basedon the training set to which weights and risk factors areassigned it determines whether or not a given input file ispacked

22 Unpacking Techniques Unpacking techniques areroughly classified into four types e first is a direct analysismethod in which a person unpacks directly using an analysistool is method has been popular in the early days andtypical analysis tools include OllyDbg PlugIn [24] Immu-nity Debugger [25] and IDA PlugIn [26] is manualunpacking is relatively accurate but it takes too much timeto manually analyze all the commands of the packed file

e second is a static unpacking method based on thecharacteristics of the specific packing algorithm(s) Since thismethod is based on the characteristics of the packing al-gorithm it is very useful when the prior knowledge of thealgorithm is known Static unpacking is commonly used inantivirus programs and it has an advantage of no infectionrisk and fast unpacking since the packed file does not need tobe directly executed However it has a disadvantage that wecannot use it if we do not know the packing algorithm usedor if the packing technique is partially modified

e third is a dynamic unpacking method that does notdepend on the packing algorithm(s) ere have been manystudies on dynamic unpacking since we can do unpackingeven without knowing the exact packing algorithm usedJeong et al [21] propose an entropy-based dynamicunpacking method that finds the OEP (original entry point)based on the characteristics of increasing the entropy if thefile is packed Bat-Erdene et al [27] classify unpacking al-gorithms into multiple clusters by measuring the entropychanges during the unpacking process Cesare and Yang [28]propose an algorithm for constructing a control flow graphsignature using an inverse transformation technique eyperform the entropy analysis first to determine whether ornot the PE file is packed and if it is packed they find thehidden code by investigating the end of packing throughdynamic analysis Moreover recent dynamic methods in-clude OmniUnpack [29] Renovo [30] and PinDemonium[31] OmniUnpack unpacks the PE file by detecting theexecuting offset of the original code at the page level insteadof at the instruction level Renovo exploits shadowmemory tomonitor program execution and memory writes andthrough the shadow memory it extracts the hidden codefrom the executable Finally PinDemonium uses the Dy-namic Binary Instrumentation (DBI) technique to performImport Address Table (IAT) analysis JUMP command de-tection and entropy calculation and through these pro-cesses it unpacks the PE file

Table 1 Observations and solutions for the proposed all-in-one unpacking system

Observation Explanation Solution

No phase integration Unpacking-related three phases are separatelydeveloped

Adopt an all-in-one approach integrating all threephases

No detection combination ere is no attempt to combine various existingmethods for packing detection

Combine four packing detection methods to improvedetection accuracy

No real-restoration Main goal is to find OEP Restore unpacked files by performing actualunpacking as well as finding OEP

No unpacking verification ere is no quantitative way to verify the restorationaccuracy

Present a verification algorithm to evaluate theaccuracy of unpacking results

Table 2 Comparison of analytical phases supported by existing and proposed systems

Analytical phases PHAD [15] REMINDer [16] PEframe [20] [11 13 21] All-in-one systemDetection times times

Unpacking Static times times Dynamic times times

Verification times times times

Security and Communication Networks 3

e fourth is an integration method of both static anddynamic unpacking techniques Representative examples ofsuch integration methods are PolyUnpack [26] CoDisasm[32] and BinUnpack [3] PolyUnpack and CoDisasm firstextract the static model through static analysis before exe-cuting the file en they unpack the PE file by comparingthe extracted static model to the dynamic model obtained byrunning the executable PolyUnpack unpacks the file byexploiting iterative comparisons of two models andCoDisasm does it by comparing memory snapshots re-peatedly Finally BinUnpack first performs static analysis bydeveloping a hook-evasion-resistant API monitor and it thenperforms dynamic analysis by monitoring API calls with theanalyzed information

3 Overall Architecture of All-in-OneUnpacking System

As we mentioned in Section 1 we note four importantobservations no phase integration no detection combi-nation no real-restoration and no unpacking verificationIn this section we describe the overall architecture of theall-in-one unpacking system that considers these fourobservations Figure 1 shows a flow diagram of the pro-posed all-in-one system As shown in the figure we im-prove no phase integration by supporting all three phases ofdetection unpacking and verification in an integratedframework In the diagram for a given input PE file (1①)the system first determines whether or not it is packed(2②) If the file is not packed it proceeds to check the nextPE file If it is packed the system checks whether or notthere is an unpacking library used in packing (3③) If theunpacking library exists the system performs staticunpacking using the library (4④) otherwise the systemperforms dynamic unpacking that depends on packingalgorithm (5⑤) After completing the unpacking thesystem calculates the restoration accuracy to verify thecorrectness of unpacking (6⑥)

We use the existing and proposed techniques togetherto implement each phase of the all-in-one unpackingsystem Table 3 summarizes the existing or proposedmethods used in each phase First in the detection phasewe use four techniques to detect the packing of the input PEfile is hybrid approach improves no detection combi-nation Among four techniques the EP section test is a newone proposed in this paper which increases the packingdetection rate by investigating the explicit existence of theEP section and we describe it in detail in Section 41Second in the unpacking phase we use either staticunpacking or dynamic unpacking depending on whether ornot an unpacking library exists is unpacking phaserestores the actual file and improves no real-restorationird in the verification phase we verify the correctness ofunpacking by computing the restoration accuracy Wecalculate the restoration accuracy by comparing the bytecodes of two files is phase improves no unpackingverification and we describe this verification algorithm indetail in Section 43

4 Detection Mechanism of All-in-OneUnpacking System

41 Detection Phase As we explained in Section 3 whilethere have been various packing detection techniques therehas been no attempt to combine the advantages of thesetechniques Since they have been studied separately eachtechnique carries out packing detection focusing on its owncharacteristics only In this paper we propose a hybridapproach to improve the packing detection accuracy bycombining four different techniques

e detection phase consists of four tests EP sectionsignature WRITE attribute and entropy tests In this paperwe describe each of these techniques as Detection Phase-NoEP Section (DP-NEP) Detection Phase-Signature (DP-SIG)Detection Phase-WRITE (DP-WR) and Detection Phase-Entropy (DP-ENT) respectively Table 4 compares thedetection techniques used in the existing work and theproposed approach As shown in the table PEiD supportsonly DP-SIG [17] supports only DP-ENT [16] supports DP-WR and DP-ENT but the proposed method supports allfour techniques

Algorithm 1 presents the proposed hybrid approach ofpacking detection It takes the PE file as an input andconfirms if the file is packed or not Since DP-SIG DP-WRand DP-ENT work based on the EP section of a PE file wefirst check DP-NEP to confirm if the EP section exists If theEP section does not exist we determine that the file is packed(Line 1) On the contrary if the EP section exists we performDP-SIG to check whether or not the PE file has a packersignature If the signature exists ie if a packer is used wedetermine that the file is packed (Line 2) We then performDP-WR and DP-ENT where DP-WR checks whether or notthe WRITE attribute exists in the EP section and DP-ENTinvestigates whether or not the entropy of the EP section is ina specific range Packing-Range For example this can beunder 60 or over 068 in this paper If both conditions holdwe determine it packed and return Packed (Line 3-4) Fi-nally anyone of DP-NEP DP-SIG DP-WR and DP-ENP isnot satisfied we return Not-Packed (Line 5)

We now explain four packing detection cases of Algo-rithm 1 in detail First DP-NEP finds the EP section fromthe input PE file We here conclude that the file is packed if ithas no EP section is is because every PE file has an EPsection so no EP section means that someone hides thesection intentionally We can know the existence of the EPsection by investigating an EP section address in the headerof the PE file We explain the effect of this DP-NEP case indetail According to actual experimental results we couldnot find the EP section from some PE files packed by apacker such as Upack [34] and PESpin [35] is is becausesuch packers intentionally hide or scramble the EP sectionObviously we need to detect these PE files as the packedones in the detection phase and Algorithm 1 does Algo-rithm 1 detects such ldquoNo EP Sectionrdquo PE file as the DP-NEPcase resulting in improving the packing detection rate Inother words considering DP-NEP in Algorithm 1 we ex-ploit the effect of increasing the packing detection rate

4 Security and Communication Networks

especially for Unpack and PESpin packers In the case of DP-NEP however we cannot find the EP section itself andaccordingly we cannot proceed the next phases ofunpacking and verification More precisely unpacking suchDP-NEP packers might be possible if we manually modifythe raw hexadecimal codes by using a binary editorHowever this manual raw datamodification is beyond scopeof the paper and we do not consider the manual modifi-cation of hexadecimal codes In summary we use theconcept of ldquoNo EP Sectionrdquo in the detection phase of Al-gorithm 1 to increase the packing detection rate but we donot consider it in the unpacking and verification phases

Second in DP-SIG we investigate whether or not thesignature extracted from the EP section is in the signaturedatabase We construct the signature database with a total of7 000 signatures consisting of 4 200 signatures provided byPEiD [23] and 2 800 signatures collected from BobSoft [36]

Figure 2 shows three sample signatures of the packers storedin the database e first line of each packer represents thetype and version of the packer and the maker of the sig-nature the second line is the hexadecimal values of thepackerrsquos signature and the third line of the ep_only columnis true or false that indicates whether the signature can befound in the EP section

ird in DP-WR we check the WRITE attribute of theEP section In the process of unpacking a packed file thesystem needs to modify the memory so the packed fileabsolutely requires the WRITE attribute us we suspectthat the file is packed if theWRITE attribute exists Howeversince a normal file can also have the WRITE attribute wecannot determine packing only by the existence of theWRITE attribute erefore as proposed in [16] we detectpacking using the WRITE attribute and the entropy valuetogether

Fourth in DP-ENT we compute the entropy of an inputPE file using equation (1) and check whether that entropy isin a specific range (Packing-Range in Algorithm 1) is isbased on the previous observation [16 17] that the entropyof a packed file differs from that of an ordinary file Wecompute the entropy value for the EP section only not theentire file using equation (1) To use DP-ENT we need to setPacking-Range used for determining whether or not a file ispacked In this paper we set this range through the ex-periment in Section 53

42 Unpacking Phase As mentioned in Section 22unpacking which is performed automatically without hu-man intervention can be divided into static unpacking anddynamic unpacking Static unpacking is based on thecharacteristics of the packing algorithm so it is fast and hasno risk of infection because of no direct execution of thepacked file On the contrary dynamic unpacking has acharacteristic that it does not depend on a specific packingalgorithm even though it takes a long analysis time In thispaper we use both static and dynamic methods to maximizethe unpacking accuracy More specifically if the packer typefound in the detection phase provides an unpacking librarywe use the static unpacking otherwise we perform the

Detection phase

Unpacking phase

Verification phase

① Input PE file

② Packing detection

④ Static unpacking ⑤ Dynamic unpacking

Yes

No

③ Unpacking library

Yes No

⑥ Verification

Figure 1 Overall working mechanism of the all-in-one unpacking system

Table 3 Existing or proposed techniques used in each phase of theall-in-one unpacking system

Analyticalphase Techniques used or proposed

Detection

(i) EP Section test to be proposed in Section 41(ii) Signature test [23]

(iii) WRITE attribute test [16](iv) Entropy test [17]

Unpacking(i) Staticmdashlibrary-based unpacking [33]

(ii) Dynamicmdashentropy change-based unpacking[14 27]

Verification Verification algorithm to be proposed in Section43

Table 4 Detection techniques used in existing and proposedapproaches

Method DP-NEP DP-SIG DP-WR DP-ENTPEiD [23] times times times

[17] times times times [16] times times Proposed approach

Security and Communication Networks 5

dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode

e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the

heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time

Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it

[Code-Lock vxx]

signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A

ep_only = true

[CodeCrypt v014b]

signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

[CodeCrypt v015b]

signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

Figure 2 Examples of packer signatures stored in the signaturedatabase

Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin

(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if

end

ALGORITHM 1 Packing detection

6 Security and Communication Networks

stops the execution and returns DstAddr as OEP (Lines13 and 14)

43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file

Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data

Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)

5 Performance Evaluation

51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware

Input A PE fileOutput OEP the start address of the unpacked original filebegin

(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found

end

ALGORITHM 2 Dynamic unpacking algorithm

Security and Communication Networks 7

files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical

server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the

text section

rdata section

data section

rsrc section

0000040000000410000004200000043000000440

00012C0000012C1000012C2000012C3000012C40

0001F0000001F0100001F0200001F0300001F040

0002040000020410000204200002043000020440

5566D10F5E

904E00185A

D028209890

0010000100

8B8BFAB75D

FFFF00FEFE

414345464A

0000000000

EC022B11C3

0101000101

4141514141

0000000000

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

5080404818

0020005000

0C024202CC

FFFFFDFEFE

424445474B

0000000000

8B660166CC

0101010101

4141414141

0000000001

D185EB89CC

0000000000

0000000000

0080008000

56C00754CC

6C00F03ACC

88BCB02850

0018000001

8D758D0ECC

FF00FDFEFD

424445494B

0000000000

72F5A4FECC

0100010101

4141414141

0000000000

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

C0F420B080

0038000068

648B00D2CC

FFFFFEFEFE

424446494B

0000000000

24750075CC

0101010101

4141414141

0200100000

000800F0CC

0000000000

0000000000

0080000080

(a)

UPX0 section

UPX1 section

UPX1 section

rsrc section

0040100000401010004010200040103000401040

0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C

0042A0000042A0100042A0200042A0300042A040

0042100000421010004210200042103000421040

5566D10F5E

904E00185A

0010000100

D028209890

8B8BFAB75D

FFFF00FEFE

0000000000

414345464A

EC022B11C3

0101000101

0000000000

4141514141

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

0020005000

5080404817

0C024202CC

FFFFFDFEFE

0000000000

424445474B

8B660166CC

0101010101

0000000001

4141414141

D185EB89CC

0000000000

0080008000

0000000000

56C00754CC

6C00F03ACC

0018000001

88BCB02850

8D758D0ECC

FF00FDFEFD

0000000000

424445494B

72F5A4FECC

0100010101

0000000000

4141414141

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

0038000068

C0F420B080

648B00D2CC

FFFFFEFEFE

0000000000

424446494B

24750075CC

0101010101

0200100000

4141414141

000800F0CC

0000000000

0080000080

0000000000

(b)

Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file

Input A an original fileB a restored file

Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin

(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai

(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy

end

ALGORITHM 3 Restoration rate calculation algorithm

8 Security and Communication Networks

dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger

In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance

In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy

52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly

Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it

means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section

Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100

53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685

We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy

In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections

Security and Communication Networks 9

(a) (b)

(c) (d)

Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase

0

50

100

150

200

250

300

350

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Not packed (b)Summation of (c) to (m)

Num

ber o

f file

s

(a)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(b)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(c)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(d)

Figure 5 Continued

10 Security and Communication Networks

Num

ber o

f file

s0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(e)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(f )

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(g)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(h)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(i)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(j)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(k)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(l)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(m)

Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack

Security and Communication Networks 11

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

starts but does not address actual restoration of packed filesIn addition even if unpacking is successful there is noverification for the unpacked files to evaluate theirunpacking accuracy or reliability Based on a thoroughsurvey on recent studies and products we pay attention tofour important observations (1) no integration of necessaryphases (2) no combination of detection techniques (3) noreal-restoration of packed malware and (4) no verificationof unpacked images We briefly call these observations nophase integration no detection combination no real-resto-ration and no unpacking verification respectively

e goal of this paper is to propose an all-in-oneunpacking system solving or improving four observations asshown in Table 1 We explain each observation and itssolution in detail as follows First no phase integration is aproblem in which analysts have to perform each phaseseparately since the packing detection unpacking andverification phases are separately developed To resolve thisproblem we present an all-in-one unpacking system thatintegrates all these three phases of packing detectionunpacking and verification As shown in Table 2 eachexisting method focuses on a particular phase at is manystudies have been done in depth focusing on one of the threephases covered in this paper In many real applicationshowever we often need to apply all three phases at once orsequentially rather than just one phase To satisfy this de-mand the proposed all-in-one system supports all threephases together Since the system supports all necessaryphases in an integrated framework we can easily analyze themalware and obtain the objective restoration rate throughthe actual unpacking and verification phases

e second observation no detection combination isthat there is no attempt to combine various packingdetection methods Based on this observation we proposea hybrid approach to improve the packing detectionaccuracy by combining four existing and new packingdetection methods e third observation no real-res-toration is that the previous work tries to find only OEPwhen it detects packing but there is little discussionabout restoring the actual executable file We improvethis problem by actually restoring the image of theunpacked file e fourth observation no unpackingverification is that there is no previous work to measurethe restoration accuracy of unpacked images We resolvethis problem by presenting a verification algorithm toquantitatively evaluate the restoration accuracy ofunpacked file images

In this paper we implement the all-in-one unpackingsystem to reflect the solutions of Table 1 and empiricallyevaluate the proposed system First we verify each phase ofthe all-in-one system to confirm that the overall workingmechanism works well Next we construct a dataset com-posed of 2 600 PE (portable executable) files and use the setin evaluation of detection unpacking and verificationphases of the all-in-one system In the detection phase weneed to determine the entropy range first and through apreliminary experiment we set it to be less than 600 orgreater than 685 Experimental results of comparing theproposed hybrid method with individual or combined

detection stage(s) show that the proposed method shows thehighest detection accuracy up to 984 without any falsepositives We also try to actually unpack all files of thedataset to verify the unpacking phase and we confirm thatall files including Yodarsquos Protector [22] are 100 unpackedFinally through the evaluation of the verification phase wesee that most files except those packed by some packers showup to 100 restoration accuracy

e contribution of the paper is as follows First this isthe first attempt to integrate detection unpacking andverification phases into a single unified framework eproposed all-in-one concept which integrates all threephases rather than focusing on only one phase allows usersto more easily detect and analyze malware Second based onempirical experience we present a hybrid approach ofpacking detection that exploits four existing and new de-tection techniques ird we perform actual unpackingrestoration beyond detecting OEPs only Fourth we proposea verification algorithm that measures the accuracy ofunpacking results Fifth through various experiments wedemonstrate the superiority of the proposed all-in-oneunpacking system

e rest of the paper is organized as follows Section 2describes related work on packing detection and unpackingSection 3 presents an overall architecture of the proposed all-in-one unpacking system Section 4 explains detectionunpacking and verification phases in detail to show how theall-in-one system works Section 5 presents experimentalresults We finally summarize and conclude the paper inSection 6

2 Related Work

21 Packing Detection Techniques It takes much time todetect and analyze the malware to which packing is appliedand thus there have beenmany studies on packing detectionand unpacking techniques Choi et al [15] propose PHADthat detects packed files by analyzing the header informationof PE files PHAD selects eight characteristic variables todistinguish between the general file and the packed filethrough heuristic analysis of the PE header and based onthese variables it determines whether a file is packed or notMore specifically it calculates the Euclidean distance of eightvariables selected by the characteristic vector (CV) andconfirms it to be packed if that distance exceeds the heu-ristically determined minimum threshold PHAD shows thehigher detection accuracy with lower false-negative rates ascompared to commonly used software PEiD [23] but has adrawback that many false positives occur

Lyda and Hamrock [17] use the entropy-based analysisto detect encrypted or packed malware Entropy is a measureof the uncertainty of information and packed files tend tohave higher entropy than regular files because they compressthe original executable sections or collapse those sectionsinto a few new sections We calculate the entropy as shownin equation (1) where p(i) is the probability of the i-th unitof information (such as a number) in event xrsquos series of nsymbols is equation generates entropy scores as realnumbers [17]

2 Security and Communication Networks

H(x) minus 1113944

n

i1p(i)log2p(i) (1)

e entropy-based method has an advantage of beingeasily applied to various packers However it causes a falsepositive if the entropy of a normal file is low or a falsenegative if that of a packed file is high

Han and Lee [16] propose REMINDer that detects thepacking based on the entropy value of the entry point sectionand the WRITE attribute ey note that a large number offalse positives occur because the entropy calculation range istoo wide and thus REMINDer uses the EP (entry point)section only in computing the entropy value In addition tothe entropy-based detection method of the EP section theyalso use the WRITE attribute test which is an essentialfeature of the packed file to reduce false positives is isbecause unlike ordinary files packed PE files requireWRITE permission to perform unpacking before the file isexecuted

Arora proposed by Rohit et al [19] is a packing detectiontechnique for analyzing the PE information in a packed fileusing a heuristic algorithm It defines various parameters forpacking analysis and presents a heuristic algorithm forassigning weights and risk factors to those parameters Basedon the training set to which weights and risk factors areassigned it determines whether or not a given input file ispacked

22 Unpacking Techniques Unpacking techniques areroughly classified into four types e first is a direct analysismethod in which a person unpacks directly using an analysistool is method has been popular in the early days andtypical analysis tools include OllyDbg PlugIn [24] Immu-nity Debugger [25] and IDA PlugIn [26] is manualunpacking is relatively accurate but it takes too much timeto manually analyze all the commands of the packed file

e second is a static unpacking method based on thecharacteristics of the specific packing algorithm(s) Since thismethod is based on the characteristics of the packing al-gorithm it is very useful when the prior knowledge of thealgorithm is known Static unpacking is commonly used inantivirus programs and it has an advantage of no infectionrisk and fast unpacking since the packed file does not need tobe directly executed However it has a disadvantage that wecannot use it if we do not know the packing algorithm usedor if the packing technique is partially modified

e third is a dynamic unpacking method that does notdepend on the packing algorithm(s) ere have been manystudies on dynamic unpacking since we can do unpackingeven without knowing the exact packing algorithm usedJeong et al [21] propose an entropy-based dynamicunpacking method that finds the OEP (original entry point)based on the characteristics of increasing the entropy if thefile is packed Bat-Erdene et al [27] classify unpacking al-gorithms into multiple clusters by measuring the entropychanges during the unpacking process Cesare and Yang [28]propose an algorithm for constructing a control flow graphsignature using an inverse transformation technique eyperform the entropy analysis first to determine whether ornot the PE file is packed and if it is packed they find thehidden code by investigating the end of packing throughdynamic analysis Moreover recent dynamic methods in-clude OmniUnpack [29] Renovo [30] and PinDemonium[31] OmniUnpack unpacks the PE file by detecting theexecuting offset of the original code at the page level insteadof at the instruction level Renovo exploits shadowmemory tomonitor program execution and memory writes andthrough the shadow memory it extracts the hidden codefrom the executable Finally PinDemonium uses the Dy-namic Binary Instrumentation (DBI) technique to performImport Address Table (IAT) analysis JUMP command de-tection and entropy calculation and through these pro-cesses it unpacks the PE file

Table 1 Observations and solutions for the proposed all-in-one unpacking system

Observation Explanation Solution

No phase integration Unpacking-related three phases are separatelydeveloped

Adopt an all-in-one approach integrating all threephases

No detection combination ere is no attempt to combine various existingmethods for packing detection

Combine four packing detection methods to improvedetection accuracy

No real-restoration Main goal is to find OEP Restore unpacked files by performing actualunpacking as well as finding OEP

No unpacking verification ere is no quantitative way to verify the restorationaccuracy

Present a verification algorithm to evaluate theaccuracy of unpacking results

Table 2 Comparison of analytical phases supported by existing and proposed systems

Analytical phases PHAD [15] REMINDer [16] PEframe [20] [11 13 21] All-in-one systemDetection times times

Unpacking Static times times Dynamic times times

Verification times times times

Security and Communication Networks 3

e fourth is an integration method of both static anddynamic unpacking techniques Representative examples ofsuch integration methods are PolyUnpack [26] CoDisasm[32] and BinUnpack [3] PolyUnpack and CoDisasm firstextract the static model through static analysis before exe-cuting the file en they unpack the PE file by comparingthe extracted static model to the dynamic model obtained byrunning the executable PolyUnpack unpacks the file byexploiting iterative comparisons of two models andCoDisasm does it by comparing memory snapshots re-peatedly Finally BinUnpack first performs static analysis bydeveloping a hook-evasion-resistant API monitor and it thenperforms dynamic analysis by monitoring API calls with theanalyzed information

3 Overall Architecture of All-in-OneUnpacking System

As we mentioned in Section 1 we note four importantobservations no phase integration no detection combi-nation no real-restoration and no unpacking verificationIn this section we describe the overall architecture of theall-in-one unpacking system that considers these fourobservations Figure 1 shows a flow diagram of the pro-posed all-in-one system As shown in the figure we im-prove no phase integration by supporting all three phases ofdetection unpacking and verification in an integratedframework In the diagram for a given input PE file (1①)the system first determines whether or not it is packed(2②) If the file is not packed it proceeds to check the nextPE file If it is packed the system checks whether or notthere is an unpacking library used in packing (3③) If theunpacking library exists the system performs staticunpacking using the library (4④) otherwise the systemperforms dynamic unpacking that depends on packingalgorithm (5⑤) After completing the unpacking thesystem calculates the restoration accuracy to verify thecorrectness of unpacking (6⑥)

We use the existing and proposed techniques togetherto implement each phase of the all-in-one unpackingsystem Table 3 summarizes the existing or proposedmethods used in each phase First in the detection phasewe use four techniques to detect the packing of the input PEfile is hybrid approach improves no detection combi-nation Among four techniques the EP section test is a newone proposed in this paper which increases the packingdetection rate by investigating the explicit existence of theEP section and we describe it in detail in Section 41Second in the unpacking phase we use either staticunpacking or dynamic unpacking depending on whether ornot an unpacking library exists is unpacking phaserestores the actual file and improves no real-restorationird in the verification phase we verify the correctness ofunpacking by computing the restoration accuracy Wecalculate the restoration accuracy by comparing the bytecodes of two files is phase improves no unpackingverification and we describe this verification algorithm indetail in Section 43

4 Detection Mechanism of All-in-OneUnpacking System

41 Detection Phase As we explained in Section 3 whilethere have been various packing detection techniques therehas been no attempt to combine the advantages of thesetechniques Since they have been studied separately eachtechnique carries out packing detection focusing on its owncharacteristics only In this paper we propose a hybridapproach to improve the packing detection accuracy bycombining four different techniques

e detection phase consists of four tests EP sectionsignature WRITE attribute and entropy tests In this paperwe describe each of these techniques as Detection Phase-NoEP Section (DP-NEP) Detection Phase-Signature (DP-SIG)Detection Phase-WRITE (DP-WR) and Detection Phase-Entropy (DP-ENT) respectively Table 4 compares thedetection techniques used in the existing work and theproposed approach As shown in the table PEiD supportsonly DP-SIG [17] supports only DP-ENT [16] supports DP-WR and DP-ENT but the proposed method supports allfour techniques

Algorithm 1 presents the proposed hybrid approach ofpacking detection It takes the PE file as an input andconfirms if the file is packed or not Since DP-SIG DP-WRand DP-ENT work based on the EP section of a PE file wefirst check DP-NEP to confirm if the EP section exists If theEP section does not exist we determine that the file is packed(Line 1) On the contrary if the EP section exists we performDP-SIG to check whether or not the PE file has a packersignature If the signature exists ie if a packer is used wedetermine that the file is packed (Line 2) We then performDP-WR and DP-ENT where DP-WR checks whether or notthe WRITE attribute exists in the EP section and DP-ENTinvestigates whether or not the entropy of the EP section is ina specific range Packing-Range For example this can beunder 60 or over 068 in this paper If both conditions holdwe determine it packed and return Packed (Line 3-4) Fi-nally anyone of DP-NEP DP-SIG DP-WR and DP-ENP isnot satisfied we return Not-Packed (Line 5)

We now explain four packing detection cases of Algo-rithm 1 in detail First DP-NEP finds the EP section fromthe input PE file We here conclude that the file is packed if ithas no EP section is is because every PE file has an EPsection so no EP section means that someone hides thesection intentionally We can know the existence of the EPsection by investigating an EP section address in the headerof the PE file We explain the effect of this DP-NEP case indetail According to actual experimental results we couldnot find the EP section from some PE files packed by apacker such as Upack [34] and PESpin [35] is is becausesuch packers intentionally hide or scramble the EP sectionObviously we need to detect these PE files as the packedones in the detection phase and Algorithm 1 does Algo-rithm 1 detects such ldquoNo EP Sectionrdquo PE file as the DP-NEPcase resulting in improving the packing detection rate Inother words considering DP-NEP in Algorithm 1 we ex-ploit the effect of increasing the packing detection rate

4 Security and Communication Networks

especially for Unpack and PESpin packers In the case of DP-NEP however we cannot find the EP section itself andaccordingly we cannot proceed the next phases ofunpacking and verification More precisely unpacking suchDP-NEP packers might be possible if we manually modifythe raw hexadecimal codes by using a binary editorHowever this manual raw datamodification is beyond scopeof the paper and we do not consider the manual modifi-cation of hexadecimal codes In summary we use theconcept of ldquoNo EP Sectionrdquo in the detection phase of Al-gorithm 1 to increase the packing detection rate but we donot consider it in the unpacking and verification phases

Second in DP-SIG we investigate whether or not thesignature extracted from the EP section is in the signaturedatabase We construct the signature database with a total of7 000 signatures consisting of 4 200 signatures provided byPEiD [23] and 2 800 signatures collected from BobSoft [36]

Figure 2 shows three sample signatures of the packers storedin the database e first line of each packer represents thetype and version of the packer and the maker of the sig-nature the second line is the hexadecimal values of thepackerrsquos signature and the third line of the ep_only columnis true or false that indicates whether the signature can befound in the EP section

ird in DP-WR we check the WRITE attribute of theEP section In the process of unpacking a packed file thesystem needs to modify the memory so the packed fileabsolutely requires the WRITE attribute us we suspectthat the file is packed if theWRITE attribute exists Howeversince a normal file can also have the WRITE attribute wecannot determine packing only by the existence of theWRITE attribute erefore as proposed in [16] we detectpacking using the WRITE attribute and the entropy valuetogether

Fourth in DP-ENT we compute the entropy of an inputPE file using equation (1) and check whether that entropy isin a specific range (Packing-Range in Algorithm 1) is isbased on the previous observation [16 17] that the entropyof a packed file differs from that of an ordinary file Wecompute the entropy value for the EP section only not theentire file using equation (1) To use DP-ENT we need to setPacking-Range used for determining whether or not a file ispacked In this paper we set this range through the ex-periment in Section 53

42 Unpacking Phase As mentioned in Section 22unpacking which is performed automatically without hu-man intervention can be divided into static unpacking anddynamic unpacking Static unpacking is based on thecharacteristics of the packing algorithm so it is fast and hasno risk of infection because of no direct execution of thepacked file On the contrary dynamic unpacking has acharacteristic that it does not depend on a specific packingalgorithm even though it takes a long analysis time In thispaper we use both static and dynamic methods to maximizethe unpacking accuracy More specifically if the packer typefound in the detection phase provides an unpacking librarywe use the static unpacking otherwise we perform the

Detection phase

Unpacking phase

Verification phase

① Input PE file

② Packing detection

④ Static unpacking ⑤ Dynamic unpacking

Yes

No

③ Unpacking library

Yes No

⑥ Verification

Figure 1 Overall working mechanism of the all-in-one unpacking system

Table 3 Existing or proposed techniques used in each phase of theall-in-one unpacking system

Analyticalphase Techniques used or proposed

Detection

(i) EP Section test to be proposed in Section 41(ii) Signature test [23]

(iii) WRITE attribute test [16](iv) Entropy test [17]

Unpacking(i) Staticmdashlibrary-based unpacking [33]

(ii) Dynamicmdashentropy change-based unpacking[14 27]

Verification Verification algorithm to be proposed in Section43

Table 4 Detection techniques used in existing and proposedapproaches

Method DP-NEP DP-SIG DP-WR DP-ENTPEiD [23] times times times

[17] times times times [16] times times Proposed approach

Security and Communication Networks 5

dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode

e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the

heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time

Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it

[Code-Lock vxx]

signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A

ep_only = true

[CodeCrypt v014b]

signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

[CodeCrypt v015b]

signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

Figure 2 Examples of packer signatures stored in the signaturedatabase

Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin

(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if

end

ALGORITHM 1 Packing detection

6 Security and Communication Networks

stops the execution and returns DstAddr as OEP (Lines13 and 14)

43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file

Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data

Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)

5 Performance Evaluation

51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware

Input A PE fileOutput OEP the start address of the unpacked original filebegin

(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found

end

ALGORITHM 2 Dynamic unpacking algorithm

Security and Communication Networks 7

files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical

server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the

text section

rdata section

data section

rsrc section

0000040000000410000004200000043000000440

00012C0000012C1000012C2000012C3000012C40

0001F0000001F0100001F0200001F0300001F040

0002040000020410000204200002043000020440

5566D10F5E

904E00185A

D028209890

0010000100

8B8BFAB75D

FFFF00FEFE

414345464A

0000000000

EC022B11C3

0101000101

4141514141

0000000000

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

5080404818

0020005000

0C024202CC

FFFFFDFEFE

424445474B

0000000000

8B660166CC

0101010101

4141414141

0000000001

D185EB89CC

0000000000

0000000000

0080008000

56C00754CC

6C00F03ACC

88BCB02850

0018000001

8D758D0ECC

FF00FDFEFD

424445494B

0000000000

72F5A4FECC

0100010101

4141414141

0000000000

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

C0F420B080

0038000068

648B00D2CC

FFFFFEFEFE

424446494B

0000000000

24750075CC

0101010101

4141414141

0200100000

000800F0CC

0000000000

0000000000

0080000080

(a)

UPX0 section

UPX1 section

UPX1 section

rsrc section

0040100000401010004010200040103000401040

0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C

0042A0000042A0100042A0200042A0300042A040

0042100000421010004210200042103000421040

5566D10F5E

904E00185A

0010000100

D028209890

8B8BFAB75D

FFFF00FEFE

0000000000

414345464A

EC022B11C3

0101000101

0000000000

4141514141

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

0020005000

5080404817

0C024202CC

FFFFFDFEFE

0000000000

424445474B

8B660166CC

0101010101

0000000001

4141414141

D185EB89CC

0000000000

0080008000

0000000000

56C00754CC

6C00F03ACC

0018000001

88BCB02850

8D758D0ECC

FF00FDFEFD

0000000000

424445494B

72F5A4FECC

0100010101

0000000000

4141414141

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

0038000068

C0F420B080

648B00D2CC

FFFFFEFEFE

0000000000

424446494B

24750075CC

0101010101

0200100000

4141414141

000800F0CC

0000000000

0080000080

0000000000

(b)

Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file

Input A an original fileB a restored file

Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin

(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai

(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy

end

ALGORITHM 3 Restoration rate calculation algorithm

8 Security and Communication Networks

dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger

In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance

In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy

52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly

Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it

means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section

Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100

53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685

We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy

In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections

Security and Communication Networks 9

(a) (b)

(c) (d)

Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase

0

50

100

150

200

250

300

350

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Not packed (b)Summation of (c) to (m)

Num

ber o

f file

s

(a)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(b)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(c)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(d)

Figure 5 Continued

10 Security and Communication Networks

Num

ber o

f file

s0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(e)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(f )

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(g)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(h)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(i)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(j)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(k)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(l)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(m)

Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack

Security and Communication Networks 11

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

H(x) minus 1113944

n

i1p(i)log2p(i) (1)

e entropy-based method has an advantage of beingeasily applied to various packers However it causes a falsepositive if the entropy of a normal file is low or a falsenegative if that of a packed file is high

Han and Lee [16] propose REMINDer that detects thepacking based on the entropy value of the entry point sectionand the WRITE attribute ey note that a large number offalse positives occur because the entropy calculation range istoo wide and thus REMINDer uses the EP (entry point)section only in computing the entropy value In addition tothe entropy-based detection method of the EP section theyalso use the WRITE attribute test which is an essentialfeature of the packed file to reduce false positives is isbecause unlike ordinary files packed PE files requireWRITE permission to perform unpacking before the file isexecuted

Arora proposed by Rohit et al [19] is a packing detectiontechnique for analyzing the PE information in a packed fileusing a heuristic algorithm It defines various parameters forpacking analysis and presents a heuristic algorithm forassigning weights and risk factors to those parameters Basedon the training set to which weights and risk factors areassigned it determines whether or not a given input file ispacked

22 Unpacking Techniques Unpacking techniques areroughly classified into four types e first is a direct analysismethod in which a person unpacks directly using an analysistool is method has been popular in the early days andtypical analysis tools include OllyDbg PlugIn [24] Immu-nity Debugger [25] and IDA PlugIn [26] is manualunpacking is relatively accurate but it takes too much timeto manually analyze all the commands of the packed file

e second is a static unpacking method based on thecharacteristics of the specific packing algorithm(s) Since thismethod is based on the characteristics of the packing al-gorithm it is very useful when the prior knowledge of thealgorithm is known Static unpacking is commonly used inantivirus programs and it has an advantage of no infectionrisk and fast unpacking since the packed file does not need tobe directly executed However it has a disadvantage that wecannot use it if we do not know the packing algorithm usedor if the packing technique is partially modified

e third is a dynamic unpacking method that does notdepend on the packing algorithm(s) ere have been manystudies on dynamic unpacking since we can do unpackingeven without knowing the exact packing algorithm usedJeong et al [21] propose an entropy-based dynamicunpacking method that finds the OEP (original entry point)based on the characteristics of increasing the entropy if thefile is packed Bat-Erdene et al [27] classify unpacking al-gorithms into multiple clusters by measuring the entropychanges during the unpacking process Cesare and Yang [28]propose an algorithm for constructing a control flow graphsignature using an inverse transformation technique eyperform the entropy analysis first to determine whether ornot the PE file is packed and if it is packed they find thehidden code by investigating the end of packing throughdynamic analysis Moreover recent dynamic methods in-clude OmniUnpack [29] Renovo [30] and PinDemonium[31] OmniUnpack unpacks the PE file by detecting theexecuting offset of the original code at the page level insteadof at the instruction level Renovo exploits shadowmemory tomonitor program execution and memory writes andthrough the shadow memory it extracts the hidden codefrom the executable Finally PinDemonium uses the Dy-namic Binary Instrumentation (DBI) technique to performImport Address Table (IAT) analysis JUMP command de-tection and entropy calculation and through these pro-cesses it unpacks the PE file

Table 1 Observations and solutions for the proposed all-in-one unpacking system

Observation Explanation Solution

No phase integration Unpacking-related three phases are separatelydeveloped

Adopt an all-in-one approach integrating all threephases

No detection combination ere is no attempt to combine various existingmethods for packing detection

Combine four packing detection methods to improvedetection accuracy

No real-restoration Main goal is to find OEP Restore unpacked files by performing actualunpacking as well as finding OEP

No unpacking verification ere is no quantitative way to verify the restorationaccuracy

Present a verification algorithm to evaluate theaccuracy of unpacking results

Table 2 Comparison of analytical phases supported by existing and proposed systems

Analytical phases PHAD [15] REMINDer [16] PEframe [20] [11 13 21] All-in-one systemDetection times times

Unpacking Static times times Dynamic times times

Verification times times times

Security and Communication Networks 3

e fourth is an integration method of both static anddynamic unpacking techniques Representative examples ofsuch integration methods are PolyUnpack [26] CoDisasm[32] and BinUnpack [3] PolyUnpack and CoDisasm firstextract the static model through static analysis before exe-cuting the file en they unpack the PE file by comparingthe extracted static model to the dynamic model obtained byrunning the executable PolyUnpack unpacks the file byexploiting iterative comparisons of two models andCoDisasm does it by comparing memory snapshots re-peatedly Finally BinUnpack first performs static analysis bydeveloping a hook-evasion-resistant API monitor and it thenperforms dynamic analysis by monitoring API calls with theanalyzed information

3 Overall Architecture of All-in-OneUnpacking System

As we mentioned in Section 1 we note four importantobservations no phase integration no detection combi-nation no real-restoration and no unpacking verificationIn this section we describe the overall architecture of theall-in-one unpacking system that considers these fourobservations Figure 1 shows a flow diagram of the pro-posed all-in-one system As shown in the figure we im-prove no phase integration by supporting all three phases ofdetection unpacking and verification in an integratedframework In the diagram for a given input PE file (1①)the system first determines whether or not it is packed(2②) If the file is not packed it proceeds to check the nextPE file If it is packed the system checks whether or notthere is an unpacking library used in packing (3③) If theunpacking library exists the system performs staticunpacking using the library (4④) otherwise the systemperforms dynamic unpacking that depends on packingalgorithm (5⑤) After completing the unpacking thesystem calculates the restoration accuracy to verify thecorrectness of unpacking (6⑥)

We use the existing and proposed techniques togetherto implement each phase of the all-in-one unpackingsystem Table 3 summarizes the existing or proposedmethods used in each phase First in the detection phasewe use four techniques to detect the packing of the input PEfile is hybrid approach improves no detection combi-nation Among four techniques the EP section test is a newone proposed in this paper which increases the packingdetection rate by investigating the explicit existence of theEP section and we describe it in detail in Section 41Second in the unpacking phase we use either staticunpacking or dynamic unpacking depending on whether ornot an unpacking library exists is unpacking phaserestores the actual file and improves no real-restorationird in the verification phase we verify the correctness ofunpacking by computing the restoration accuracy Wecalculate the restoration accuracy by comparing the bytecodes of two files is phase improves no unpackingverification and we describe this verification algorithm indetail in Section 43

4 Detection Mechanism of All-in-OneUnpacking System

41 Detection Phase As we explained in Section 3 whilethere have been various packing detection techniques therehas been no attempt to combine the advantages of thesetechniques Since they have been studied separately eachtechnique carries out packing detection focusing on its owncharacteristics only In this paper we propose a hybridapproach to improve the packing detection accuracy bycombining four different techniques

e detection phase consists of four tests EP sectionsignature WRITE attribute and entropy tests In this paperwe describe each of these techniques as Detection Phase-NoEP Section (DP-NEP) Detection Phase-Signature (DP-SIG)Detection Phase-WRITE (DP-WR) and Detection Phase-Entropy (DP-ENT) respectively Table 4 compares thedetection techniques used in the existing work and theproposed approach As shown in the table PEiD supportsonly DP-SIG [17] supports only DP-ENT [16] supports DP-WR and DP-ENT but the proposed method supports allfour techniques

Algorithm 1 presents the proposed hybrid approach ofpacking detection It takes the PE file as an input andconfirms if the file is packed or not Since DP-SIG DP-WRand DP-ENT work based on the EP section of a PE file wefirst check DP-NEP to confirm if the EP section exists If theEP section does not exist we determine that the file is packed(Line 1) On the contrary if the EP section exists we performDP-SIG to check whether or not the PE file has a packersignature If the signature exists ie if a packer is used wedetermine that the file is packed (Line 2) We then performDP-WR and DP-ENT where DP-WR checks whether or notthe WRITE attribute exists in the EP section and DP-ENTinvestigates whether or not the entropy of the EP section is ina specific range Packing-Range For example this can beunder 60 or over 068 in this paper If both conditions holdwe determine it packed and return Packed (Line 3-4) Fi-nally anyone of DP-NEP DP-SIG DP-WR and DP-ENP isnot satisfied we return Not-Packed (Line 5)

We now explain four packing detection cases of Algo-rithm 1 in detail First DP-NEP finds the EP section fromthe input PE file We here conclude that the file is packed if ithas no EP section is is because every PE file has an EPsection so no EP section means that someone hides thesection intentionally We can know the existence of the EPsection by investigating an EP section address in the headerof the PE file We explain the effect of this DP-NEP case indetail According to actual experimental results we couldnot find the EP section from some PE files packed by apacker such as Upack [34] and PESpin [35] is is becausesuch packers intentionally hide or scramble the EP sectionObviously we need to detect these PE files as the packedones in the detection phase and Algorithm 1 does Algo-rithm 1 detects such ldquoNo EP Sectionrdquo PE file as the DP-NEPcase resulting in improving the packing detection rate Inother words considering DP-NEP in Algorithm 1 we ex-ploit the effect of increasing the packing detection rate

4 Security and Communication Networks

especially for Unpack and PESpin packers In the case of DP-NEP however we cannot find the EP section itself andaccordingly we cannot proceed the next phases ofunpacking and verification More precisely unpacking suchDP-NEP packers might be possible if we manually modifythe raw hexadecimal codes by using a binary editorHowever this manual raw datamodification is beyond scopeof the paper and we do not consider the manual modifi-cation of hexadecimal codes In summary we use theconcept of ldquoNo EP Sectionrdquo in the detection phase of Al-gorithm 1 to increase the packing detection rate but we donot consider it in the unpacking and verification phases

Second in DP-SIG we investigate whether or not thesignature extracted from the EP section is in the signaturedatabase We construct the signature database with a total of7 000 signatures consisting of 4 200 signatures provided byPEiD [23] and 2 800 signatures collected from BobSoft [36]

Figure 2 shows three sample signatures of the packers storedin the database e first line of each packer represents thetype and version of the packer and the maker of the sig-nature the second line is the hexadecimal values of thepackerrsquos signature and the third line of the ep_only columnis true or false that indicates whether the signature can befound in the EP section

ird in DP-WR we check the WRITE attribute of theEP section In the process of unpacking a packed file thesystem needs to modify the memory so the packed fileabsolutely requires the WRITE attribute us we suspectthat the file is packed if theWRITE attribute exists Howeversince a normal file can also have the WRITE attribute wecannot determine packing only by the existence of theWRITE attribute erefore as proposed in [16] we detectpacking using the WRITE attribute and the entropy valuetogether

Fourth in DP-ENT we compute the entropy of an inputPE file using equation (1) and check whether that entropy isin a specific range (Packing-Range in Algorithm 1) is isbased on the previous observation [16 17] that the entropyof a packed file differs from that of an ordinary file Wecompute the entropy value for the EP section only not theentire file using equation (1) To use DP-ENT we need to setPacking-Range used for determining whether or not a file ispacked In this paper we set this range through the ex-periment in Section 53

42 Unpacking Phase As mentioned in Section 22unpacking which is performed automatically without hu-man intervention can be divided into static unpacking anddynamic unpacking Static unpacking is based on thecharacteristics of the packing algorithm so it is fast and hasno risk of infection because of no direct execution of thepacked file On the contrary dynamic unpacking has acharacteristic that it does not depend on a specific packingalgorithm even though it takes a long analysis time In thispaper we use both static and dynamic methods to maximizethe unpacking accuracy More specifically if the packer typefound in the detection phase provides an unpacking librarywe use the static unpacking otherwise we perform the

Detection phase

Unpacking phase

Verification phase

① Input PE file

② Packing detection

④ Static unpacking ⑤ Dynamic unpacking

Yes

No

③ Unpacking library

Yes No

⑥ Verification

Figure 1 Overall working mechanism of the all-in-one unpacking system

Table 3 Existing or proposed techniques used in each phase of theall-in-one unpacking system

Analyticalphase Techniques used or proposed

Detection

(i) EP Section test to be proposed in Section 41(ii) Signature test [23]

(iii) WRITE attribute test [16](iv) Entropy test [17]

Unpacking(i) Staticmdashlibrary-based unpacking [33]

(ii) Dynamicmdashentropy change-based unpacking[14 27]

Verification Verification algorithm to be proposed in Section43

Table 4 Detection techniques used in existing and proposedapproaches

Method DP-NEP DP-SIG DP-WR DP-ENTPEiD [23] times times times

[17] times times times [16] times times Proposed approach

Security and Communication Networks 5

dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode

e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the

heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time

Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it

[Code-Lock vxx]

signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A

ep_only = true

[CodeCrypt v014b]

signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

[CodeCrypt v015b]

signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

Figure 2 Examples of packer signatures stored in the signaturedatabase

Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin

(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if

end

ALGORITHM 1 Packing detection

6 Security and Communication Networks

stops the execution and returns DstAddr as OEP (Lines13 and 14)

43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file

Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data

Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)

5 Performance Evaluation

51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware

Input A PE fileOutput OEP the start address of the unpacked original filebegin

(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found

end

ALGORITHM 2 Dynamic unpacking algorithm

Security and Communication Networks 7

files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical

server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the

text section

rdata section

data section

rsrc section

0000040000000410000004200000043000000440

00012C0000012C1000012C2000012C3000012C40

0001F0000001F0100001F0200001F0300001F040

0002040000020410000204200002043000020440

5566D10F5E

904E00185A

D028209890

0010000100

8B8BFAB75D

FFFF00FEFE

414345464A

0000000000

EC022B11C3

0101000101

4141514141

0000000000

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

5080404818

0020005000

0C024202CC

FFFFFDFEFE

424445474B

0000000000

8B660166CC

0101010101

4141414141

0000000001

D185EB89CC

0000000000

0000000000

0080008000

56C00754CC

6C00F03ACC

88BCB02850

0018000001

8D758D0ECC

FF00FDFEFD

424445494B

0000000000

72F5A4FECC

0100010101

4141414141

0000000000

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

C0F420B080

0038000068

648B00D2CC

FFFFFEFEFE

424446494B

0000000000

24750075CC

0101010101

4141414141

0200100000

000800F0CC

0000000000

0000000000

0080000080

(a)

UPX0 section

UPX1 section

UPX1 section

rsrc section

0040100000401010004010200040103000401040

0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C

0042A0000042A0100042A0200042A0300042A040

0042100000421010004210200042103000421040

5566D10F5E

904E00185A

0010000100

D028209890

8B8BFAB75D

FFFF00FEFE

0000000000

414345464A

EC022B11C3

0101000101

0000000000

4141514141

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

0020005000

5080404817

0C024202CC

FFFFFDFEFE

0000000000

424445474B

8B660166CC

0101010101

0000000001

4141414141

D185EB89CC

0000000000

0080008000

0000000000

56C00754CC

6C00F03ACC

0018000001

88BCB02850

8D758D0ECC

FF00FDFEFD

0000000000

424445494B

72F5A4FECC

0100010101

0000000000

4141414141

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

0038000068

C0F420B080

648B00D2CC

FFFFFEFEFE

0000000000

424446494B

24750075CC

0101010101

0200100000

4141414141

000800F0CC

0000000000

0080000080

0000000000

(b)

Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file

Input A an original fileB a restored file

Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin

(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai

(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy

end

ALGORITHM 3 Restoration rate calculation algorithm

8 Security and Communication Networks

dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger

In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance

In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy

52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly

Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it

means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section

Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100

53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685

We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy

In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections

Security and Communication Networks 9

(a) (b)

(c) (d)

Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase

0

50

100

150

200

250

300

350

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Not packed (b)Summation of (c) to (m)

Num

ber o

f file

s

(a)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(b)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(c)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(d)

Figure 5 Continued

10 Security and Communication Networks

Num

ber o

f file

s0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(e)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(f )

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(g)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(h)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(i)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(j)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(k)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(l)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(m)

Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack

Security and Communication Networks 11

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

e fourth is an integration method of both static anddynamic unpacking techniques Representative examples ofsuch integration methods are PolyUnpack [26] CoDisasm[32] and BinUnpack [3] PolyUnpack and CoDisasm firstextract the static model through static analysis before exe-cuting the file en they unpack the PE file by comparingthe extracted static model to the dynamic model obtained byrunning the executable PolyUnpack unpacks the file byexploiting iterative comparisons of two models andCoDisasm does it by comparing memory snapshots re-peatedly Finally BinUnpack first performs static analysis bydeveloping a hook-evasion-resistant API monitor and it thenperforms dynamic analysis by monitoring API calls with theanalyzed information

3 Overall Architecture of All-in-OneUnpacking System

As we mentioned in Section 1 we note four importantobservations no phase integration no detection combi-nation no real-restoration and no unpacking verificationIn this section we describe the overall architecture of theall-in-one unpacking system that considers these fourobservations Figure 1 shows a flow diagram of the pro-posed all-in-one system As shown in the figure we im-prove no phase integration by supporting all three phases ofdetection unpacking and verification in an integratedframework In the diagram for a given input PE file (1①)the system first determines whether or not it is packed(2②) If the file is not packed it proceeds to check the nextPE file If it is packed the system checks whether or notthere is an unpacking library used in packing (3③) If theunpacking library exists the system performs staticunpacking using the library (4④) otherwise the systemperforms dynamic unpacking that depends on packingalgorithm (5⑤) After completing the unpacking thesystem calculates the restoration accuracy to verify thecorrectness of unpacking (6⑥)

We use the existing and proposed techniques togetherto implement each phase of the all-in-one unpackingsystem Table 3 summarizes the existing or proposedmethods used in each phase First in the detection phasewe use four techniques to detect the packing of the input PEfile is hybrid approach improves no detection combi-nation Among four techniques the EP section test is a newone proposed in this paper which increases the packingdetection rate by investigating the explicit existence of theEP section and we describe it in detail in Section 41Second in the unpacking phase we use either staticunpacking or dynamic unpacking depending on whether ornot an unpacking library exists is unpacking phaserestores the actual file and improves no real-restorationird in the verification phase we verify the correctness ofunpacking by computing the restoration accuracy Wecalculate the restoration accuracy by comparing the bytecodes of two files is phase improves no unpackingverification and we describe this verification algorithm indetail in Section 43

4 Detection Mechanism of All-in-OneUnpacking System

41 Detection Phase As we explained in Section 3 whilethere have been various packing detection techniques therehas been no attempt to combine the advantages of thesetechniques Since they have been studied separately eachtechnique carries out packing detection focusing on its owncharacteristics only In this paper we propose a hybridapproach to improve the packing detection accuracy bycombining four different techniques

e detection phase consists of four tests EP sectionsignature WRITE attribute and entropy tests In this paperwe describe each of these techniques as Detection Phase-NoEP Section (DP-NEP) Detection Phase-Signature (DP-SIG)Detection Phase-WRITE (DP-WR) and Detection Phase-Entropy (DP-ENT) respectively Table 4 compares thedetection techniques used in the existing work and theproposed approach As shown in the table PEiD supportsonly DP-SIG [17] supports only DP-ENT [16] supports DP-WR and DP-ENT but the proposed method supports allfour techniques

Algorithm 1 presents the proposed hybrid approach ofpacking detection It takes the PE file as an input andconfirms if the file is packed or not Since DP-SIG DP-WRand DP-ENT work based on the EP section of a PE file wefirst check DP-NEP to confirm if the EP section exists If theEP section does not exist we determine that the file is packed(Line 1) On the contrary if the EP section exists we performDP-SIG to check whether or not the PE file has a packersignature If the signature exists ie if a packer is used wedetermine that the file is packed (Line 2) We then performDP-WR and DP-ENT where DP-WR checks whether or notthe WRITE attribute exists in the EP section and DP-ENTinvestigates whether or not the entropy of the EP section is ina specific range Packing-Range For example this can beunder 60 or over 068 in this paper If both conditions holdwe determine it packed and return Packed (Line 3-4) Fi-nally anyone of DP-NEP DP-SIG DP-WR and DP-ENP isnot satisfied we return Not-Packed (Line 5)

We now explain four packing detection cases of Algo-rithm 1 in detail First DP-NEP finds the EP section fromthe input PE file We here conclude that the file is packed if ithas no EP section is is because every PE file has an EPsection so no EP section means that someone hides thesection intentionally We can know the existence of the EPsection by investigating an EP section address in the headerof the PE file We explain the effect of this DP-NEP case indetail According to actual experimental results we couldnot find the EP section from some PE files packed by apacker such as Upack [34] and PESpin [35] is is becausesuch packers intentionally hide or scramble the EP sectionObviously we need to detect these PE files as the packedones in the detection phase and Algorithm 1 does Algo-rithm 1 detects such ldquoNo EP Sectionrdquo PE file as the DP-NEPcase resulting in improving the packing detection rate Inother words considering DP-NEP in Algorithm 1 we ex-ploit the effect of increasing the packing detection rate

4 Security and Communication Networks

especially for Unpack and PESpin packers In the case of DP-NEP however we cannot find the EP section itself andaccordingly we cannot proceed the next phases ofunpacking and verification More precisely unpacking suchDP-NEP packers might be possible if we manually modifythe raw hexadecimal codes by using a binary editorHowever this manual raw datamodification is beyond scopeof the paper and we do not consider the manual modifi-cation of hexadecimal codes In summary we use theconcept of ldquoNo EP Sectionrdquo in the detection phase of Al-gorithm 1 to increase the packing detection rate but we donot consider it in the unpacking and verification phases

Second in DP-SIG we investigate whether or not thesignature extracted from the EP section is in the signaturedatabase We construct the signature database with a total of7 000 signatures consisting of 4 200 signatures provided byPEiD [23] and 2 800 signatures collected from BobSoft [36]

Figure 2 shows three sample signatures of the packers storedin the database e first line of each packer represents thetype and version of the packer and the maker of the sig-nature the second line is the hexadecimal values of thepackerrsquos signature and the third line of the ep_only columnis true or false that indicates whether the signature can befound in the EP section

ird in DP-WR we check the WRITE attribute of theEP section In the process of unpacking a packed file thesystem needs to modify the memory so the packed fileabsolutely requires the WRITE attribute us we suspectthat the file is packed if theWRITE attribute exists Howeversince a normal file can also have the WRITE attribute wecannot determine packing only by the existence of theWRITE attribute erefore as proposed in [16] we detectpacking using the WRITE attribute and the entropy valuetogether

Fourth in DP-ENT we compute the entropy of an inputPE file using equation (1) and check whether that entropy isin a specific range (Packing-Range in Algorithm 1) is isbased on the previous observation [16 17] that the entropyof a packed file differs from that of an ordinary file Wecompute the entropy value for the EP section only not theentire file using equation (1) To use DP-ENT we need to setPacking-Range used for determining whether or not a file ispacked In this paper we set this range through the ex-periment in Section 53

42 Unpacking Phase As mentioned in Section 22unpacking which is performed automatically without hu-man intervention can be divided into static unpacking anddynamic unpacking Static unpacking is based on thecharacteristics of the packing algorithm so it is fast and hasno risk of infection because of no direct execution of thepacked file On the contrary dynamic unpacking has acharacteristic that it does not depend on a specific packingalgorithm even though it takes a long analysis time In thispaper we use both static and dynamic methods to maximizethe unpacking accuracy More specifically if the packer typefound in the detection phase provides an unpacking librarywe use the static unpacking otherwise we perform the

Detection phase

Unpacking phase

Verification phase

① Input PE file

② Packing detection

④ Static unpacking ⑤ Dynamic unpacking

Yes

No

③ Unpacking library

Yes No

⑥ Verification

Figure 1 Overall working mechanism of the all-in-one unpacking system

Table 3 Existing or proposed techniques used in each phase of theall-in-one unpacking system

Analyticalphase Techniques used or proposed

Detection

(i) EP Section test to be proposed in Section 41(ii) Signature test [23]

(iii) WRITE attribute test [16](iv) Entropy test [17]

Unpacking(i) Staticmdashlibrary-based unpacking [33]

(ii) Dynamicmdashentropy change-based unpacking[14 27]

Verification Verification algorithm to be proposed in Section43

Table 4 Detection techniques used in existing and proposedapproaches

Method DP-NEP DP-SIG DP-WR DP-ENTPEiD [23] times times times

[17] times times times [16] times times Proposed approach

Security and Communication Networks 5

dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode

e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the

heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time

Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it

[Code-Lock vxx]

signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A

ep_only = true

[CodeCrypt v014b]

signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

[CodeCrypt v015b]

signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

Figure 2 Examples of packer signatures stored in the signaturedatabase

Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin

(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if

end

ALGORITHM 1 Packing detection

6 Security and Communication Networks

stops the execution and returns DstAddr as OEP (Lines13 and 14)

43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file

Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data

Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)

5 Performance Evaluation

51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware

Input A PE fileOutput OEP the start address of the unpacked original filebegin

(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found

end

ALGORITHM 2 Dynamic unpacking algorithm

Security and Communication Networks 7

files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical

server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the

text section

rdata section

data section

rsrc section

0000040000000410000004200000043000000440

00012C0000012C1000012C2000012C3000012C40

0001F0000001F0100001F0200001F0300001F040

0002040000020410000204200002043000020440

5566D10F5E

904E00185A

D028209890

0010000100

8B8BFAB75D

FFFF00FEFE

414345464A

0000000000

EC022B11C3

0101000101

4141514141

0000000000

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

5080404818

0020005000

0C024202CC

FFFFFDFEFE

424445474B

0000000000

8B660166CC

0101010101

4141414141

0000000001

D185EB89CC

0000000000

0000000000

0080008000

56C00754CC

6C00F03ACC

88BCB02850

0018000001

8D758D0ECC

FF00FDFEFD

424445494B

0000000000

72F5A4FECC

0100010101

4141414141

0000000000

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

C0F420B080

0038000068

648B00D2CC

FFFFFEFEFE

424446494B

0000000000

24750075CC

0101010101

4141414141

0200100000

000800F0CC

0000000000

0000000000

0080000080

(a)

UPX0 section

UPX1 section

UPX1 section

rsrc section

0040100000401010004010200040103000401040

0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C

0042A0000042A0100042A0200042A0300042A040

0042100000421010004210200042103000421040

5566D10F5E

904E00185A

0010000100

D028209890

8B8BFAB75D

FFFF00FEFE

0000000000

414345464A

EC022B11C3

0101000101

0000000000

4141514141

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

0020005000

5080404817

0C024202CC

FFFFFDFEFE

0000000000

424445474B

8B660166CC

0101010101

0000000001

4141414141

D185EB89CC

0000000000

0080008000

0000000000

56C00754CC

6C00F03ACC

0018000001

88BCB02850

8D758D0ECC

FF00FDFEFD

0000000000

424445494B

72F5A4FECC

0100010101

0000000000

4141414141

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

0038000068

C0F420B080

648B00D2CC

FFFFFEFEFE

0000000000

424446494B

24750075CC

0101010101

0200100000

4141414141

000800F0CC

0000000000

0080000080

0000000000

(b)

Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file

Input A an original fileB a restored file

Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin

(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai

(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy

end

ALGORITHM 3 Restoration rate calculation algorithm

8 Security and Communication Networks

dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger

In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance

In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy

52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly

Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it

means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section

Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100

53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685

We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy

In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections

Security and Communication Networks 9

(a) (b)

(c) (d)

Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase

0

50

100

150

200

250

300

350

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Not packed (b)Summation of (c) to (m)

Num

ber o

f file

s

(a)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(b)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(c)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(d)

Figure 5 Continued

10 Security and Communication Networks

Num

ber o

f file

s0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(e)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(f )

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(g)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(h)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(i)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(j)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(k)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(l)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(m)

Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack

Security and Communication Networks 11

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

especially for Unpack and PESpin packers In the case of DP-NEP however we cannot find the EP section itself andaccordingly we cannot proceed the next phases ofunpacking and verification More precisely unpacking suchDP-NEP packers might be possible if we manually modifythe raw hexadecimal codes by using a binary editorHowever this manual raw datamodification is beyond scopeof the paper and we do not consider the manual modifi-cation of hexadecimal codes In summary we use theconcept of ldquoNo EP Sectionrdquo in the detection phase of Al-gorithm 1 to increase the packing detection rate but we donot consider it in the unpacking and verification phases

Second in DP-SIG we investigate whether or not thesignature extracted from the EP section is in the signaturedatabase We construct the signature database with a total of7 000 signatures consisting of 4 200 signatures provided byPEiD [23] and 2 800 signatures collected from BobSoft [36]

Figure 2 shows three sample signatures of the packers storedin the database e first line of each packer represents thetype and version of the packer and the maker of the sig-nature the second line is the hexadecimal values of thepackerrsquos signature and the third line of the ep_only columnis true or false that indicates whether the signature can befound in the EP section

ird in DP-WR we check the WRITE attribute of theEP section In the process of unpacking a packed file thesystem needs to modify the memory so the packed fileabsolutely requires the WRITE attribute us we suspectthat the file is packed if theWRITE attribute exists Howeversince a normal file can also have the WRITE attribute wecannot determine packing only by the existence of theWRITE attribute erefore as proposed in [16] we detectpacking using the WRITE attribute and the entropy valuetogether

Fourth in DP-ENT we compute the entropy of an inputPE file using equation (1) and check whether that entropy isin a specific range (Packing-Range in Algorithm 1) is isbased on the previous observation [16 17] that the entropyof a packed file differs from that of an ordinary file Wecompute the entropy value for the EP section only not theentire file using equation (1) To use DP-ENT we need to setPacking-Range used for determining whether or not a file ispacked In this paper we set this range through the ex-periment in Section 53

42 Unpacking Phase As mentioned in Section 22unpacking which is performed automatically without hu-man intervention can be divided into static unpacking anddynamic unpacking Static unpacking is based on thecharacteristics of the packing algorithm so it is fast and hasno risk of infection because of no direct execution of thepacked file On the contrary dynamic unpacking has acharacteristic that it does not depend on a specific packingalgorithm even though it takes a long analysis time In thispaper we use both static and dynamic methods to maximizethe unpacking accuracy More specifically if the packer typefound in the detection phase provides an unpacking librarywe use the static unpacking otherwise we perform the

Detection phase

Unpacking phase

Verification phase

① Input PE file

② Packing detection

④ Static unpacking ⑤ Dynamic unpacking

Yes

No

③ Unpacking library

Yes No

⑥ Verification

Figure 1 Overall working mechanism of the all-in-one unpacking system

Table 3 Existing or proposed techniques used in each phase of theall-in-one unpacking system

Analyticalphase Techniques used or proposed

Detection

(i) EP Section test to be proposed in Section 41(ii) Signature test [23]

(iii) WRITE attribute test [16](iv) Entropy test [17]

Unpacking(i) Staticmdashlibrary-based unpacking [33]

(ii) Dynamicmdashentropy change-based unpacking[14 27]

Verification Verification algorithm to be proposed in Section43

Table 4 Detection techniques used in existing and proposedapproaches

Method DP-NEP DP-SIG DP-WR DP-ENTPEiD [23] times times times

[17] times times times [16] times times Proposed approach

Security and Communication Networks 5

dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode

e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the

heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time

Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it

[Code-Lock vxx]

signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A

ep_only = true

[CodeCrypt v014b]

signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

[CodeCrypt v015b]

signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

Figure 2 Examples of packer signatures stored in the signaturedatabase

Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin

(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if

end

ALGORITHM 1 Packing detection

6 Security and Communication Networks

stops the execution and returns DstAddr as OEP (Lines13 and 14)

43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file

Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data

Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)

5 Performance Evaluation

51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware

Input A PE fileOutput OEP the start address of the unpacked original filebegin

(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found

end

ALGORITHM 2 Dynamic unpacking algorithm

Security and Communication Networks 7

files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical

server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the

text section

rdata section

data section

rsrc section

0000040000000410000004200000043000000440

00012C0000012C1000012C2000012C3000012C40

0001F0000001F0100001F0200001F0300001F040

0002040000020410000204200002043000020440

5566D10F5E

904E00185A

D028209890

0010000100

8B8BFAB75D

FFFF00FEFE

414345464A

0000000000

EC022B11C3

0101000101

4141514141

0000000000

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

5080404818

0020005000

0C024202CC

FFFFFDFEFE

424445474B

0000000000

8B660166CC

0101010101

4141414141

0000000001

D185EB89CC

0000000000

0000000000

0080008000

56C00754CC

6C00F03ACC

88BCB02850

0018000001

8D758D0ECC

FF00FDFEFD

424445494B

0000000000

72F5A4FECC

0100010101

4141414141

0000000000

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

C0F420B080

0038000068

648B00D2CC

FFFFFEFEFE

424446494B

0000000000

24750075CC

0101010101

4141414141

0200100000

000800F0CC

0000000000

0000000000

0080000080

(a)

UPX0 section

UPX1 section

UPX1 section

rsrc section

0040100000401010004010200040103000401040

0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C

0042A0000042A0100042A0200042A0300042A040

0042100000421010004210200042103000421040

5566D10F5E

904E00185A

0010000100

D028209890

8B8BFAB75D

FFFF00FEFE

0000000000

414345464A

EC022B11C3

0101000101

0000000000

4141514141

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

0020005000

5080404817

0C024202CC

FFFFFDFEFE

0000000000

424445474B

8B660166CC

0101010101

0000000001

4141414141

D185EB89CC

0000000000

0080008000

0000000000

56C00754CC

6C00F03ACC

0018000001

88BCB02850

8D758D0ECC

FF00FDFEFD

0000000000

424445494B

72F5A4FECC

0100010101

0000000000

4141414141

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

0038000068

C0F420B080

648B00D2CC

FFFFFEFEFE

0000000000

424446494B

24750075CC

0101010101

0200100000

4141414141

000800F0CC

0000000000

0080000080

0000000000

(b)

Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file

Input A an original fileB a restored file

Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin

(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai

(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy

end

ALGORITHM 3 Restoration rate calculation algorithm

8 Security and Communication Networks

dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger

In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance

In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy

52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly

Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it

means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section

Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100

53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685

We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy

In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections

Security and Communication Networks 9

(a) (b)

(c) (d)

Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase

0

50

100

150

200

250

300

350

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Not packed (b)Summation of (c) to (m)

Num

ber o

f file

s

(a)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(b)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(c)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(d)

Figure 5 Continued

10 Security and Communication Networks

Num

ber o

f file

s0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(e)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(f )

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(g)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(h)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(i)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(j)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(k)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(l)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(m)

Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack

Security and Communication Networks 11

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

dynamic unpacking We can easily solve the static unpackingby using the found unpacking library for each packer so wefocus on explaining how the dynamic unpacking works indetail in this section Our dynamic unpacking method is alsoknown as the ldquoexecute-after-writerdquomethodMore precisely theproposed method first restores the original file into thememory by finding the OEP from the packed file and thenanalyzes (ie executes) the restored original file in thememoryIn order to avoid the infection of realmachines we perform theunpacking and validation phases in the debugging mode

e OEP is the address of the first instruction to executeafter the packed file is unpacked and it represents the entrypoint of the original program In other words since the codesfollowing the OEP represent the starting codes of the originalprogram finding such OEP is the most important issue indynamic unpacking To find the OEP we use the entropychange-based analysis proposed by Bat-Erdene et al [27]When a packed file is executed the packed data is uncom-pressed and written to the memory and at this time the datamight be changed or added to the file Because of this datachange the entropy value continuously changes during theunpacking process If unpacking is successful the entropy valueis stabilized the original file is restored and the IP (instructionpointer) is moved to another section of the restored original filefor its execution us we can find the OEP by examiningwhether the IP moves to another section when there is nochange in the entropy value of each section (If the heuristic ofinvestigating change of IP sections is known someone maycreate a malicious packing method that can evade the

heuristic However for this avoidance the OEP must befound without change of IP sections and packing a PE filesuch way might not be easy According to the actualanalysis result we do not find such cases in the 19 packerswe covered Even though the case hardly exists it mighthave occurred in the worst case We leave the advancedapproach that works against such avoidance as a furtherstudy) To accomplish this we need to execute the targetprocess in units of instructions and monitor the entropychange however measuring the entropy value after everyinstruction is very time-consuming and very inefficientWe here note that the OEP generally exists after a branchinstruction so we calculate the entropy value only whenthe instruction is JMP Conditional JMP (CJMP) orRETN We also note that the iterations at program exe-cution are independent of the OEP so we store and reusethe entropy values measured after JMP CJMP and RETNinstructions to avoid duplicated calculations by the it-erations and reduce the total analysis time

Algorithm 2 shows the proposed dynamic unpackingalgorithm based on the entropy changes and it returnsthe OEP address for a given PE file First it computes theinitial entropy of each section (Line 1) en it repeatsthe algorithm until the process ends (Lines 2 to 18) If thecurrent IP address is greater than the last address of thecurrent section it returns Not-Found (Line 4) is isbecause IP moves to the next section without a JMPfamily instruction so it cannot find OEP in this sectionOn the contrary if IP is one of JMP CJMP or RETN itexecutes the file up to IP and stores the address of thenext instruction in DstAddr (Lines 5 to 7) But if IP existsin IP-history or DstAddr exists in DstAddr-history itmeans an iterative instruction or address so we move tothe next IP (Line 8) Otherwise it stores IP and DstAddrinto each history for the next iteration (Line 9) At thistime if DstAddr is out of the range of the file it returnsNot-Found because it cannot find OEP within the file(Line 10) If DstAddr is in the normal range it calculatesthe entropy of each section again and compares thecurrent entropy with the previous entropy to checkwhether it is stable (Lines 11 to 13) If the entropy changeis stable and DstAddr is in a different section it meansthat unpacking a file is successful and IP is moved toanother section by a JMP family instruction us it

[Code-Lock vxx]

signature = 47 8B C2 05 1E 00 52 8B D0 B8 02 3D CD 21 8B D8 5A

ep_only = true

[CodeCrypt v014b]

signature = E9 C5 02 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

[CodeCrypt v015b]

signature = E9 31 03 00 00 EB 02 83 3D 58 EB 02 FF 1D 5B EB 02 0F C7 5F

ep_only = true

Figure 2 Examples of packer signatures stored in the signaturedatabase

Input A PE fileOutput Packed or Not-Packed Packing detection resultbegin

(1) if No EP Section then return Packed DP-NEP(2) else if Packer Signature Found then return Packed DP-SIG(3) else if ldquoWRITErdquo enabled and EP Section Entropy isin Packing-Range then(4) return Packed DP-WR DP-ENT(5) else return Not-Packed(6) end-if

end

ALGORITHM 1 Packing detection

6 Security and Communication Networks

stops the execution and returns DstAddr as OEP (Lines13 and 14)

43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file

Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data

Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)

5 Performance Evaluation

51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware

Input A PE fileOutput OEP the start address of the unpacked original filebegin

(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found

end

ALGORITHM 2 Dynamic unpacking algorithm

Security and Communication Networks 7

files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical

server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the

text section

rdata section

data section

rsrc section

0000040000000410000004200000043000000440

00012C0000012C1000012C2000012C3000012C40

0001F0000001F0100001F0200001F0300001F040

0002040000020410000204200002043000020440

5566D10F5E

904E00185A

D028209890

0010000100

8B8BFAB75D

FFFF00FEFE

414345464A

0000000000

EC022B11C3

0101000101

4141514141

0000000000

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

5080404818

0020005000

0C024202CC

FFFFFDFEFE

424445474B

0000000000

8B660166CC

0101010101

4141414141

0000000001

D185EB89CC

0000000000

0000000000

0080008000

56C00754CC

6C00F03ACC

88BCB02850

0018000001

8D758D0ECC

FF00FDFEFD

424445494B

0000000000

72F5A4FECC

0100010101

4141414141

0000000000

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

C0F420B080

0038000068

648B00D2CC

FFFFFEFEFE

424446494B

0000000000

24750075CC

0101010101

4141414141

0200100000

000800F0CC

0000000000

0000000000

0080000080

(a)

UPX0 section

UPX1 section

UPX1 section

rsrc section

0040100000401010004010200040103000401040

0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C

0042A0000042A0100042A0200042A0300042A040

0042100000421010004210200042103000421040

5566D10F5E

904E00185A

0010000100

D028209890

8B8BFAB75D

FFFF00FEFE

0000000000

414345464A

EC022B11C3

0101000101

0000000000

4141514141

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

0020005000

5080404817

0C024202CC

FFFFFDFEFE

0000000000

424445474B

8B660166CC

0101010101

0000000001

4141414141

D185EB89CC

0000000000

0080008000

0000000000

56C00754CC

6C00F03ACC

0018000001

88BCB02850

8D758D0ECC

FF00FDFEFD

0000000000

424445494B

72F5A4FECC

0100010101

0000000000

4141414141

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

0038000068

C0F420B080

648B00D2CC

FFFFFEFEFE

0000000000

424446494B

24750075CC

0101010101

0200100000

4141414141

000800F0CC

0000000000

0080000080

0000000000

(b)

Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file

Input A an original fileB a restored file

Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin

(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai

(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy

end

ALGORITHM 3 Restoration rate calculation algorithm

8 Security and Communication Networks

dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger

In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance

In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy

52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly

Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it

means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section

Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100

53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685

We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy

In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections

Security and Communication Networks 9

(a) (b)

(c) (d)

Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase

0

50

100

150

200

250

300

350

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Not packed (b)Summation of (c) to (m)

Num

ber o

f file

s

(a)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(b)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(c)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(d)

Figure 5 Continued

10 Security and Communication Networks

Num

ber o

f file

s0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(e)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(f )

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(g)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(h)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(i)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(j)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(k)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(l)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(m)

Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack

Security and Communication Networks 11

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

stops the execution and returns DstAddr as OEP (Lines13 and 14)

43 Verification Phase In this section we present an al-gorithm for measuring the restoration accuracy by com-paring the restored file with its original file To execute thepacked file the proposed system uses dynamic unpackingand finds the OEP during its execution If the system findsthe OEP we assume the restoration of an original file issuccessful and regard that file as the restored file In theverification phase we quantitatively measure the restorationaccuracy by comparing such restored file and its original file

Figure 3 shows an example of comparing some bytecodesof the original file with those of the restored file For thecomparison we first pack the pspasswdexe file provided byMicrosoft [37] and then unpack it again by dynamicunpacking e figure shows that the bytecodes of the foursections of the original file text rdata data and rsrc areidentically appeared in UPX0 UPX1 and rsrc sections ofthe restored file at is even if the name and location ofeach section are different the same bytecodes exist in therestored file However the restored file contains padding andgarbage values and some bytecodes may be in other loca-tions in the original file us in order to compare theoriginal and the restored files we need to search where theoriginal bytecodes are located in the restored file Sincemanually comparing the bytecodes of the original and re-stored files takes a very long time in this paper we proposean efficient search algorithm using a hash function We usethe well-known MD5 [38] as a hash function of the searchalgorithm As a message reduction algorithm MD5 is widelyused in checking the integrity of long data

Algorithm 3 shows the restoration accuracy calculationalgorithm Its basic principle is to check whether eachsection of the original file exists in the restored file We hereuse the MD5 hash function to speed up the scan e inputsto the algorithm are an original file A and a restored file Band the output is restorationAccuracy a list of storing res-toration accuracy values of all sections First we removegarbage and padding values from the restored file (Line 1)For each section Ai of A we obtain the length curLen of thesection and calculate a hash value H1 by applying the MD5hash function (Lines 4 to 7) We then calculate another hashvalue H2 from the recovered file B by considering only thelength of curLen (Line 9) If H1 equals to H2 (Line 10) itmeans that B contains Ai so we stop the comparison andcalculate the restoration accuracy of the section (Line 14)However if H1 and H2 are not equal we perform the hashvalue comparison again by decrementing the length curLenof Ai (Lines 6 to 13) We repeat this process to calculate therestoration accuracy for the section Ai (Line 14) Aftercompleting the accuracy calculation for all sections thealgorithm returns the list restorationAccuracy (Line 16)

5 Performance Evaluation

51 Experimental Environment In this section we presentthe results of experimental evaluation to show the superi-ority of the all-in-one unpacking system We conduct theexperiments in the virtual server as well as the physicalserver e reasons for experimenting in two server envi-ronments are as follows First unpacking results may bedifferent between the virtual server and the physical server ifa packed file has anti-unpacking features such as Anti-VM orAnti-Debug Second the dynamic unpacking of malware

Input A PE fileOutput OEP the start address of the unpacked original filebegin

(1) Measure the entropy value of each section(2) while the process is not reached to the end do(3) IP⟵ the current instruction pointer(4) if IPrsquos address gt current sectionrsquos last address then return Not-Found(5) else if IP is JMP or CJMP or RETN then(6) Execute instructions until IP(7) DstAddr⟵ the next instruction address(8) if IP notin IP-Histroy and DstAddr notin DstAddr-History then(9) Store IP and DstAddr into each history(10) if DstAddr is out of the file then return Not-Found(11) else(12) Re-calculate the entropy value of each section(13) if Entropy change is stable and DstAddr is in the different section then(14) return DstAddr(15) end-if(16) end-if(17) end-if(18) end-while(19) return Not-Found

end

ALGORITHM 2 Dynamic unpacking algorithm

Security and Communication Networks 7

files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical

server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the

text section

rdata section

data section

rsrc section

0000040000000410000004200000043000000440

00012C0000012C1000012C2000012C3000012C40

0001F0000001F0100001F0200001F0300001F040

0002040000020410000204200002043000020440

5566D10F5E

904E00185A

D028209890

0010000100

8B8BFAB75D

FFFF00FEFE

414345464A

0000000000

EC022B11C3

0101000101

4141514141

0000000000

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

5080404818

0020005000

0C024202CC

FFFFFDFEFE

424445474B

0000000000

8B660166CC

0101010101

4141414141

0000000001

D185EB89CC

0000000000

0000000000

0080008000

56C00754CC

6C00F03ACC

88BCB02850

0018000001

8D758D0ECC

FF00FDFEFD

424445494B

0000000000

72F5A4FECC

0100010101

4141414141

0000000000

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

C0F420B080

0038000068

648B00D2CC

FFFFFEFEFE

424446494B

0000000000

24750075CC

0101010101

4141414141

0200100000

000800F0CC

0000000000

0000000000

0080000080

(a)

UPX0 section

UPX1 section

UPX1 section

rsrc section

0040100000401010004010200040103000401040

0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C

0042A0000042A0100042A0200042A0300042A040

0042100000421010004210200042103000421040

5566D10F5E

904E00185A

0010000100

D028209890

8B8BFAB75D

FFFF00FEFE

0000000000

414345464A

EC022B11C3

0101000101

0000000000

4141514141

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

0020005000

5080404817

0C024202CC

FFFFFDFEFE

0000000000

424445474B

8B660166CC

0101010101

0000000001

4141414141

D185EB89CC

0000000000

0080008000

0000000000

56C00754CC

6C00F03ACC

0018000001

88BCB02850

8D758D0ECC

FF00FDFEFD

0000000000

424445494B

72F5A4FECC

0100010101

0000000000

4141414141

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

0038000068

C0F420B080

648B00D2CC

FFFFFEFEFE

0000000000

424446494B

24750075CC

0101010101

0200100000

4141414141

000800F0CC

0000000000

0080000080

0000000000

(b)

Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file

Input A an original fileB a restored file

Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin

(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai

(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy

end

ALGORITHM 3 Restoration rate calculation algorithm

8 Security and Communication Networks

dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger

In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance

In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy

52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly

Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it

means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section

Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100

53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685

We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy

In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections

Security and Communication Networks 9

(a) (b)

(c) (d)

Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase

0

50

100

150

200

250

300

350

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Not packed (b)Summation of (c) to (m)

Num

ber o

f file

s

(a)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(b)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(c)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(d)

Figure 5 Continued

10 Security and Communication Networks

Num

ber o

f file

s0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(e)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(f )

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(g)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(h)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(i)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(j)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(k)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(l)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(m)

Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack

Security and Communication Networks 11

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

files may infect the physical server us we performunpacking first in the virtual server and then perform theexperiment in the physical server only when there is noabnormality e hardware specifications of the experi-mental platform are as follows e virtual server is IntelCore i7-6700 340GHz with 8GB RAM and the physical

server is Intel Core i7-6700 340GHz with 32GB RAM etarget operating system is Windows 7 OS 32Bit and we usePEVIEW to view the structure and content of 32 bit PE filesAs the debugger we use Immunity Debugger 185 [25] toimplement the dynamic unpacking phase of Algorithm 2 Toimplement the all-in-one unpacking system including the

text section

rdata section

data section

rsrc section

0000040000000410000004200000043000000440

00012C0000012C1000012C2000012C3000012C40

0001F0000001F0100001F0200001F0300001F040

0002040000020410000204200002043000020440

5566D10F5E

904E00185A

D028209890

0010000100

8B8BFAB75D

FFFF00FEFE

414345464A

0000000000

EC022B11C3

0101000101

4141514141

0000000000

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

5080404818

0020005000

0C024202CC

FFFFFDFEFE

424445474B

0000000000

8B660166CC

0101010101

4141414141

0000000001

D185EB89CC

0000000000

0000000000

0080008000

56C00754CC

6C00F03ACC

88BCB02850

0018000001

8D758D0ECC

FF00FDFEFD

424445494B

0000000000

72F5A4FECC

0100010101

4141414141

0000000000

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

C0F420B080

0038000068

648B00D2CC

FFFFFEFEFE

424446494B

0000000000

24750075CC

0101010101

4141414141

0200100000

000800F0CC

0000000000

0000000000

0080000080

(a)

UPX0 section

UPX1 section

UPX1 section

rsrc section

0040100000401010004010200040103000401040

0041FB0C0041FB1C0041FB2C0041FB3C0041FB4C

0042A0000042A0100042A0200042A0300042A040

0042100000421010004210200042103000421040

5566D10F5E

904E00185A

0010000100

D028209890

8B8BFAB75D

FFFF00FEFE

0000000000

414345464A

EC022B11C3

0101000101

0000000000

4141514141

8B83F18DCC

0000000000

0000000000

0000000000

4DC28D49CC

7C40

DC2662

0020005000

5080404817

0C024202CC

FFFFFDFEFE

0000000000

424445474B

8B660166CC

0101010101

0000000001

4141414141

D185EB89CC

0000000000

0080008000

0000000000

56C00754CC

6C00F03ACC

0018000001

88BCB02850

8D758D0ECC

FF00FDFEFD

0000000000

424445494B

72F5A4FECC

0100010101

0000000000

4141414141

022B2466CC

0000000000

0000000000

0000000000

8DD60085CC

5E26024A7C

0038000068

C0F420B080

648B00D2CC

FFFFFEFEFE

0000000000

424446494B

24750075CC

0101010101

0200100000

4141414141

000800F0CC

0000000000

0080000080

0000000000

(b)

Figure 3 Comparison of sample bytecodes of the original and restored files (a) Sections of the original file (b) Sections of the restored file

Input A an original fileB a restored file

Output restorationAccuracy[1n] a list of restoration accuracy values for n sectionsbegin

(1) Remove garbage and padding values from B(2) A⟵ A1 A2 An1113864 1113865 where Ai is the i-th section of A(3) for each section Ai isin A do(4) totalLen⟵ Ailength(5) curLen⟵ totalLen(6) while curLen gt 0 do(7) H1⟵ hash (Ax[1 curLen]) hash a part of section Ai

(8) for j⟵ 1 to (Blength minus curLen + 1) do(9) H2⟵ hash (B[j j + curLen minus 1]) hash a part of B(10) if H1 H2 then break(11) else curLen⟵ curLen minus 1(12) end-for(13) end-while(14) restorationAccuracy[i] ⟵ (curLentotalLen) times 100(15) end-for(16) return restorationAccuracy

end

ALGORITHM 3 Restoration rate calculation algorithm

8 Security and Communication Networks

dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger

In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance

In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy

52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly

Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it

means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section

Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100

53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685

We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy

In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections

Security and Communication Networks 9

(a) (b)

(c) (d)

Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase

0

50

100

150

200

250

300

350

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Not packed (b)Summation of (c) to (m)

Num

ber o

f file

s

(a)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(b)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(c)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(d)

Figure 5 Continued

10 Security and Communication Networks

Num

ber o

f file

s0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(e)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(f )

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(g)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(h)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(i)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(j)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(k)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(l)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(m)

Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack

Security and Communication Networks 11

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

dynamic unpacking algorithm we run Python scripts on topof Immunity Debugger

In the experiment we use a total 130 PE files where 90files are provided by Microsoft [37] and 40 files are ran-domly collected from putty [39] and MD5 [40] sites Weperform packing of 130 executable files using 19 packersrespectively and construct a dataset consisting of a total of2600 packed files including original 130 filese 19 packersused are UPX ASPack NSPack MPRESS Yodarsquos ProtectorMEW Packman RLPack BeRoEXEPacker PECompactPetite JDpack Molebox eXpressor Yodarsquos Crypter FSGexe32pack WinUpack and Neolite e reason for con-figuring the dataset by packing the normal PE files directly isthat we can quantitatively measure the packing detectionaccuracy the false-positive rate and the false-negative rateonly if the experimental files are packed in advance

In this paper we conduct three experiments e firstexperiment is for an integration feature of the all-in-onesystem which verifies that the system supports all threephases of packing detection unpacking and verification inan integrated framework e second experiment is for thedetection phase where we first set Packing-Range an en-tropy range for judging the packing and then evaluate thepacking detection accuracy using the range e third ex-periment is for the unpacking and verification phases wherewe verify whether the system unpacks the PE file correctly bycalculating the restoration accuracy

52 Experimental Results on Overall Working MechanismIn this section we confirm through experiments that theintegration mechanism of all-in-one unpacking systemworks correctly Figure 4 shows the operation screenshot ofeach phase of the all-in-one system First Figure 4(a) showsan example screenshot of the detection phase To confirmall four cases of packing detection work correctly even if wedetect the packing by the previous case(s) we continueexecuting the remaining cases In Figure 4(a) we firstexamine DP-NEP of Algorithm 2 We detect the EP sectionin DP-NEP so we proceed to DP-SIG (the first two lines)By the DP-SIG test we detect the UPX packer so weconclude that the file is packed (the second two lines) Eventhough detection is completed we proceed to the next DP-ENT and DP-WR tests e entropy value is 793 which iscontained in Packing-Range and the attribute value of theEP section is ldquo0xe0000040rdquo which includes the WIRTEattribute us we conclude that the file is also packed byDP-ENT and DP-WR tests (the third three lines)According to Figure 4(a) we can confirm that DP-NEPDP-SIG DP-ENT and DP-WR tests of the detection phaseall work correctly

Figures 4(b) and 4(c) show example screenshots of theunpacking phase In the detection phase if an unpackinglibrary of the packer detected exists we perform staticunpacking as shown in Figure 4(b) if not we do dynamicunpacking as shown in Figure 4(c) Figure 4(b) shows theresult of static unpacking by the UPX commercial tool [33]for the UPX packed file In the figure the file size increasesfrom 285 760 bytes to 731 200 bytes by unpacking and it

means that the compression ratio is 3908 Figure 4(c)shows the result of the entropy-based dynamic unpackingIn the figure the system calculates the entropy of eachsection if IP encounters JMP and CJMP instructions itreturns the current DstAddr as OEP if the entropy change isstable and DstAddr is in the different section

Figure 4(d) shows an example screenshot of the verifi-cation phase In this example we measure the restorationaccuracy of the restored file the text and data sections of theoriginal filee length of the original text section is 18 439and the length of the unpacked file is 872 555 In the hashvalue comparison since the portion corresponding to theoriginal text section exists in the unpacked file the resto-ration accuracy becomes 100 Similarly we also calculatethe restoration accuracy of the data section as 100

53 Experimental Results onDetection Phase In this sectionwe describe two experimental results of the detection phasee first experiment is to set Packing-Range and the secondis to measure the detection accuracy of the proposed de-tection algorithm In the first experiment we experimentallydetermine Packing-Range used in DP-ENT of Algorithm 1e previous study by Lyda and Hamrock [17] uses theentropy value of the whole file and sets the entropy range ofthe packed file to [6677 6926] On the contrary Han andLee [16] focus on the EP section only and compute theentropy value for the EP section instead of the whole filerough the experiment they set the entropy value to 685and judge a file packed if its entropy is over 685

We also determine the entropy range Packing-Rangethrough the experiments on various packers Figure 5 showsthe entropy distribution of the EP section for differentpackers Figure 5(a) shows the average distribution of theoriginal and 19 packed files Figures 5(b)ndash5(m) show themore detailed entropy distribution for each packerAccording to Figures 5(b)ndash5(m) the entropy distribution ofthe packed file differs depending on the packer type Inparticular as shown in Figures 5(d) 5(e) and 5(k) somepacked files have smaller entropy values than the originalfile and thus we cannot simply set an upper threshold valueas a packed criterion as in the previous study [16] Howeverwe note that while the entropy of packer files distributesover a wide range as shown in Figures 5(c)ndash5(m) the en-tropy of original files distributes only in a specific range asshown in Figure 5(b) More specifically the entropy value ofthe normal file is between 600 and 685 as shown inFigure 5(b) Based on these experimental results we thus setPacking-Range to [0 600) or [685infin] of the EP sectionAccording to [9 29] the entropy method has the disad-vantage of being low accuracy and easy evading To over-come this point we use three other detection techniques ofDP-NEP DP-SIG and DP-WR together with DP-ENT in ahybrid manner Using this hybrid approach we can furtherimprove the detection accuracy

In the second experiment we evaluate the detectionaccuracy and false-negativefalse-positive rates Table 5compares the three detection techniques of the detectionphase with the proposed hybrid approach Since EP sections

Security and Communication Networks 9

(a) (b)

(c) (d)

Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase

0

50

100

150

200

250

300

350

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Not packed (b)Summation of (c) to (m)

Num

ber o

f file

s

(a)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(b)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(c)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(d)

Figure 5 Continued

10 Security and Communication Networks

Num

ber o

f file

s0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(e)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(f )

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(g)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(h)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(i)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(j)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(k)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(l)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(m)

Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack

Security and Communication Networks 11

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

(a) (b)

(c) (d)

Figure 4 Operation screenshots of detection unpacking and verification phases (a) Detection phase (b) Unpacking phase (staticunpacking) (c) Unpacking phase (dynamic unpacking) (d) Verification phase

0

50

100

150

200

250

300

350

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Not packed (b)Summation of (c) to (m)

Num

ber o

f file

s

(a)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(b)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(c)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(d)

Figure 5 Continued

10 Security and Communication Networks

Num

ber o

f file

s0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(e)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(f )

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(g)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(h)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(i)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(j)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(k)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(l)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(m)

Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack

Security and Communication Networks 11

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

Num

ber o

f file

s0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(e)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(f )

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(g)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(h)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(i)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(j)

Num

ber o

f file

s

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

(k)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(l)

0

20

40

60

80

100

120

140

0 05 1 15 2 25 3 35 4 45 5 55 6 65 7 75 8Entropy of EP section

Num

ber o

f file

s

(m)

Figure 5 Entropy distributions of the EP section for original and representative packed files (a) Average of original and 19 packers(b) Original (not packed) (c) UPX (d) ASPack (e) NSPack (f ) MPRESS (g) Yodarsquos Protector (h) RLPack (i) BeroEXE (j) MEW(k) PACKMAN (l) WinUpack (m) exe32pack

Security and Communication Networks 11

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

are mandatory in all PE files we can detect EP sectionsunless intentionally concealed All files of the dataset used inthe experiment also have EP sections and the detection rateby DP-NEP is trivially 0 In other words there is nopacking detection case due to absence of the EP section sowe exclude the experimental result of DP-NEP from Table 5

We now explain the results of Table 5 by each columnFirst we explain the column of DP-SIG In the case of DP-SIG Not-Packed shows a detection rate of 0 since nosignature exists in normal files is is because packingdetection does not occur in normal files which means thatDP-SIG performs an accurate detection for Not-Packed filesOn the other hand for the packed files DP-SIG detects 100of packers except NSPack e detection rate of NSPack ismerely 538 because of the nature of NSPack that thesignatures often exist in different sections rather than the EPsection In summary DP-SIG shows 950 of detectionaccuracy on average

Second for the experiment on the detection rate of DP-WR we intentionally remove the WRITE attribute from 39files (30 of packed files) per packer We perform thisremoval process in 18 packers except exe32pack e reasonfor changing the dataset is that normal packed files usuallyhaveWRITE attributes so we arbitrarily remove theWRITEattribute from 30 of files for the DP-WR experimentLooking at the DP-WR column in Table 5 DP-WR detectsonly 70 files per packer for 18 packers since we in-tentionally remove the WRITE attribute from 30 files Incase of exe32pack however we need not perform the re-moval process since it has no WRITE attribute in the PEheader And accordingly as shown in the column DP-WR ofTable 5 the detection rate of exe32pack by DP-WR istrivially 0 In the case of Not-Packed the detection rate is0 since normal files have noWRITE attribute In summary

checking the presence of theWRITE attribute is very simpleand DP-WR works correctly if the packed file has theWRITE attribute but it does not work well if the WRITEattribute does not exist or maliciously modified

ird the column of DP-ENT shows the result of en-tropy-based detection rates As shown in the table it showsthe 100 detection rate for 9 packers including NSPackUPX MEW and BeRoEXEPacker but it incurs some falsenegatives for 10 packers including ASPack Yodarsquos ProtectorPackman and MPRESS by showing the detection rate lowerthan 100 In particular for MPRESS Petite JDpackeXpressor and Neolite it shows a low detection rate below246 because their entropy value in the packer charac-teristic is in entropy distribution of the normal file DP-ENTalso has a false-positive rate of 384 for Not-Packed Insummary DP-ENT works very accurately for most packersbut it may incur false negatives for a few specific packers orfalse positives for some normal files To overcome thislimitation we use DP-ENT together with other detectiontechniques We can see the improvement through the ex-perimental results of Tables 5 and 6 In the tables when usingonly DP-ENT the detection accuracy is low in some packers(especially MPRESS) but when using it together with theother three techniques the accuracy is close to 100

Fourth the proposed hybrid approach of integratingfour techniques detects 100 of all packers except NSPackand shows no false positives e reason for this high de-tection rate is that the proposed method has all the ad-vantages of DP-SIG DP-WR and DP-ENT as well as DP-NEP In case of NSPack however the detection rate ismerely 700 this is because as mentioned above weforcedly remove the WRITE attribute from the 30 fileAccording to the results of Table 5 detection rate differsaccording to each detection technique and some techniques

Table 5 Comparison of packing detection rates of three techniques and the proposed hybrid approach

PackerTechnique

DP-SIG () DP-WR () DP-ENT () Hybrid approach ()ASPack 100 700 915 100NSPack 538 700 100 700MPRESS 100 700 246 100UPX 100 700 100 100Yodarsquos Protector 100 700 969 100MEW 100 700 100 100BeRoEXEPacker 100 700 100 100Packman 100 700 938 100RLPack 100 700 993 100PECompact 100 700 100 100Petite 100 700 000 100JDpack 100 700 562 100Molebox 100 700 100 100eXpressor 100 700 000 100Yodarsquos Crypter 100 700 915 100FSG 100 700 100 100exe32pack 100 000 100 100WinUpack 100 700 100 100Neolite 100 700 161 100Average 950 663 739 984Not-Packed 000 000 384 000

12 Security and Communication Networks

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

incur false positives or false negatives on the other hand theproposed hybrid shows the highest detection rate of 984and no false positives

Table 6 shows the detection accuracy of two or threecombinations of detection techniques and of the proposedhybrid approach First DP-SIG ENT a combination ofsignature and entropy tests detects 100 of the packed filebut detects 18 files incorrectly showing a false-positive rateof 073 Second DP-ENTWR a combination of entropyand WRITE tests shows a low detection rate of 920because the entropy value of MPRESS is in the distributionof normal file ird DP-SIG ENT WR a combination ofsignature entropy and WRITE tests and our hybrid ap-proach show 984 detection accuracy and 160 false-negative rate is is because DP-SIG ENT WR and thehybrid approach go through the same detection procedureexcept DP-NEP e false-negative rate of 160 is due tothe files having no WRITE attributes According to theresults of Tables 5 and 6 we believe that the proposed hybridapproach is an excellent method for high detection accuracyby combining the advantages of the existing and new de-tection techniques

54 Experimental Results on Unpacking and VerificationPhases In this section we verify the unpacking and veri-fication phases by performing the unpacking of Algorithm 2and calculating the restoration accuracy of the files throughthe actual restoration of Algorithm 3 However unpackingand verification phases take a lot of time so we experimentwith randomly selected 40 files of the dataset eunpacking phase consists of static unpacking and dynamicunpacking In the experiment we exclude static unpackingsince it uses the existing library as it is and focus on onlydynamic unpacking of Algorithm 2 since it tries to restorethe original file without any information of packers

Table 7 shows the restoration accuracy of the proposedmethod for text and data sections for each of the 19 packersWe cannot directly unpack Yodarsquos Protector and PECom-pact since they use Anti-Debug engineering techniques Toovercome this problem we unpack these packers bybypassing three Anti-Debugging API calls GetCurrent-ProcessID() BlockInput() and IsDebuggerPresent() [41]at is we run Algorithm 2 with the bypassing techniquesfor Yodarsquos Protector and PECompact and include its ex-perimental result in Table 7) In general since only text anddata sections of the original file are packed [42] in theexperiment we consider only the restoration accuracy ofthese text and data sections of the restored file In Table 7among 19 packers 11 packers including UPX NSPackMEW RLPack and BeRoEXEPacker show at 100 resto-ration accuracy for both text and data sections On the other

hand 8 packers including ASPack Packman and MPRESSshow less than 100 restoration accuracy because of themanner in which the packers handle the reloc section Morespecifically if there is a reloc section in the original file theunpacking relocates the text and data sections changing allthe data values in each section erefore the restorationaccuracy depends on whether there is a reloc section in theoriginal file

Table 8 shows the restoration results in detail accordingto the presence of the reloc section As shown in the tablethe restoration accuracy is 100 in the absence of the relocsection while it is very low in the range of 000 to 998 inthe presence of the reloc section However the low resto-ration accuracy is due to the nature of the packer and it doesnot mean that unpacking does not work correctly In otherwords even if the unpacking is successful the data valuemay change due to the packer characteristic and the res-toration accuracy is calculated to be low In summary wecan use the proposed verification algorithm to calculate therestoration accuracy of unpacked files which can quanti-tatively measure the accuracy of the actual restoration

In addition to text data and reloc sections we furtherconsider the PE header which includes the IAT informationIn general the IAT is located in the header of the PE file andif we actually need to run the unpacked file we must alsorestore this header information including the IAT In factthe study of [3 43] restores the header including the IATandproduces the actual whole executable file On the other handmost studies including this paper do not restore the actualwhole executable file itself but restore text data and relocsections which are major parts of the executable file is isbecause the purpose of unpacking is not to execute therestored file but to identify and extract malicious codesincluded in the restored file erefore in this paper we donot need to restore the header required for the actual ex-ecution and accordingly comparison of IAT information isalso not necessary Instead we restore and compare textdata and reloc sections that are more likely to hidemalicious code Restoring the PE header is beyond the scopeof this study we leave it as a separate research topic

6 Conclusions

In this paper we analyzed recent studies of packing de-tection and unpacking techniques and derived four im-portant observations no phase integration no detectioncombination no real-restoration and no unpacking verifi-cation We then proposed an all-in-one unpacking system toimprove these four observations First no phase integrationwas a difficulty in which analysts had to perform packingdetection unpacking and verification phases separately andwe solved this problem by presenting an all-in-one

Table 6 Comparison of packing detection accuracy of two or three combinations and of the hybrid approach

Metrics DP-SIG ENT () DP-ENT WR () DP-SIG ENT WR () Hybrid approach ()Detection accuracy 950 920 984 984False-negative rate 500 800 160 160False-positive rate 073 000 000 000

Security and Communication Networks 13

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 14: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

Tabl

e7

Restorationaccuracy

oftext

anddatasections

ofeach

packer

Section

UPX ()

NSP

ack

()

MEW ()

RLPa

ck(

)Be

RoEX

E(

)Neolite

()

FSG

()

eXpressor

()

Molebox

()

Petite

()

JDpack

()

ASP

ack

()

Yodarsquos

protector

()

exe32p

ack

()

MPR

ESS

()

Yodarsquos

crypter

()

PECom

pact

()

WinUpack

()

Packman

()

Average

()

text

100

100

100

100

100

100

100

100

100

100

100

843

840

808

788

745

740

680

550

892

data

100

100

100

100

100

100

100

100

100

100

100

843

840

100

753

745

740

680

485

902

14 Security and Communication Networks

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 15: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

unpacking system of integrating these three phases in anintegrated framework Second no detection combinationwas due to that there was no attempt to combine variouspacking detection methods and we solved this problem bypresenting a hybrid approach of combining existing and newpacking detection methods ird no real-restoration camefrom that there was little discussion about restoring theactual executable file and we solved this problem by actuallyrestoring an original image of the packed file Fourth nounpacking verification was due to that there was no explicitwork to measure the restoration accuracy of unpacked filesand we solved this problem by proposing a verificationalgorithm of quantitatively evaluating the restorationaccuracy

We constructed a dataset of 2600 PE files and experi-mentally evaluated the proposed system using the datasetFirst we performed all the phases of the all-in-oneunpacking system sequentially and confirmed that its overallworking mechanism worked well Next through the com-parison of the hybrid packing detection approach andexisting detection techniques we showed that the detectionaccuracy of the proposed method was the highest at 984on average with no false positives For this we also de-termined an entropy range of packing Packing-Rangethrough extensive experiments on 19 representative packersFinally in the unpacking and verification phases we con-firmed that unpacking was successful for all files except thefiles packed with Yodarsquos Protector We also showed that therestoration accuracy of unpacked files was nearly 100except for a few packers

As the future research we will focus on unpackingpacked files with Anti-VM Anti-Debug and Anti-Reverseengineering techniques We will also study how to computethe restoration accuracy of the file having reloc sections

Data Availability

All the data files used in the experiments are available athttpsgithubcomchesvectainPackingData

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was partly supported by the Institute of In-formation and Communications Technology Planning andEvaluation (IITP) grant funded by the Korea government(MSIT) (no 2017-0-00158 Development of Cyber reat

Intelligence (CTI) analysis and information sharing tech-nology for national cyber incident response) and the KoreaElectric Power Corporation (no R18XA05)

References

[1] M Morgenstern ldquoAV-TEST Security report 201718rdquo 2019httpswwwav-testorgfileadminpdfsecurity_reportAV-TEST_Security_Report_2017-2018pdf

[2] T Ban R Isawa S Guo D Inoue and K Nakao ldquoEfficientmalware packer identification using support vector machineswith spectrum kernelrdquo in Proceedings of the Eighth Asia JointConference on Information Security pp 69ndash76 Seoul SouthKorea July 2013

[3] B Cheng M Jiang J Fu et al ldquoTowards paving the way forlarge-scale Windows malware analysis generic binaryunpacking with orders-of-magnitude performance boostrdquo inProceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security pp 395ndash411 TorontoCanada October 2018

[4] A Gupta andM S Arya ldquoHashing based encryption and anti-debugger support for packing multiple files into single exe-cutablerdquo International Journal of Advanced Research inComputer Science vol 9 no 1 pp 914ndash920 2018

[5] Y Kawakoya M Iwamura and M Itoh ldquoMemory behavior-based automatic malware unpacking in stealth debuggingenvironmentrdquo in Proceedings of the 5th International Con-ference on Malicious and Unwanted Software pp 39ndash46Lorraine France October 2010

[6] G Liang J Pang Z Shan R Yang and Y Chen ldquoAutomaticbenchmark generation framework for malware detectionrdquoSecurity and Communication Networks vol 2018 Article ID4947695 8 pages 2018

[7] A Pektas and T Acarman ldquoA dynamic malware analyzeragainst virtual machine aware malicious softwarerdquo Securityand Communication Networks vol 7 no 12 pp 2245ndash22572014

[8] I Santos J Nieves and P G Bringas ldquoSemi-supervisedlearning for unknown malware detectionrdquo in InternationalSymposium on Distributed Computing and Artificial In-telligence A Abraham J M Corchado S R Gonzalez andJ F De Paz Santana Eds pp 415ndash422 Springer BerlinGermany 2011

[9] X Ugarte-Pedrero I Santos B Sanz C Laorden andP G Bringas ldquoCountering entropy measure attacks onpacked software detectionrdquo in Proceedings of the 9th In-ternational Conference on Consumer Communications andNetworking pp 164ndash168 Las Vegas NV USA January 2012

[10] H Won S-P Kim S Lee M-J Choi and Y-S MoonldquoSecure principal component analysis in multiple distributednodesrdquo Security and Communication Networks vol 9 no 14pp 2348ndash2358 2016

[11] S Cesare Y Xiang and W Zhou ldquoMalwisendashan effective andefficient classification system for packed and polymorphic

Table 8 Restoration accuracy according to the presence of the reloc section

SectionASPack () Packman

() Mpres () Yodarsquoscrypter ()

PECompact()

YodarsquosProtector

()

exe32pack()

WinUpack()

times times times times times times times times

text 000 100 998 100 000 100 000 100 000 100 000 100 000 100 000 100

data 000 100 881 100 000 100 004 100 000 100 005 100 100 100 000 100

Security and Communication Networks 15

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 16: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

malwarerdquo IEEE Transactions on Computers vol 62 no 6pp 1193ndash1206 2013

[12] C Malin E Casey and J Aquilina Malware Forensics In-vestigating and Analyzing Malicious Code Syngress Bur-lington MA USA 2008

[13] W Yan Z Zheng and A Nirwan ldquoRevealing packed mal-warerdquo IEEE Security and Privacy vol 6 no 5 pp 65ndash692008

[14] M Bat-Erdene T Kim H Park andH Lee ldquoPacker detectionfor multi-layer executables using entropy analysisrdquo Entropyvol 19 no 3 p 125 2017

[15] Y-S Choi I-K Kim J-T Oh and J-C Ryou ldquoPE file headeranalysis-based packed PE file detection technique (PHAD)rdquoin Proceedings of International Symposium on ComputerScience and Its Applications pp 28ndash31 Hobart AustraliaOctober 2008

[16] S Han and S Lee ldquoPacked PE file detection for malwareforensicsrdquo =e KIPS Transactions PartC vol 16C no 5pp 555ndash562 2009 in Korean

[17] R Lyda and J Hamrock ldquoUsing entropy analysis to findencrypted and packed malwarerdquo IEEE Security and Privacyvol 5 no 2 pp 40ndash45 2007

[18] R Perdisci A Lanzi and W Lee ldquoClassification of packedexecutables for accurate computer virus detectionrdquo PatternRecognition Letters vol 29 no 14 pp 1941ndash1946 2008

[19] A Rohit A Singh H Pareek and U Edara ldquoA heuristics-based static analysis approach for detecting packed PE bi-nariesrdquo International Journal of Security and Its Applicationsvol 7 no 5 pp 257ndash268 2013

[20] G Amato ldquoPEframerdquo 2019 httpsgithubcomguelfowebpeframe

[21] G Jeong E Choo J Lee M Bat-Erdene and H Lee ldquoGenericunpacking using entropy analysisrdquo Journal of Advanced In-formation Technology and Convergence vol 7 no 1pp 232ndash238 2009

[22] V DeMarines ldquoObfuscationndashhow to do it and how to crackitrdquo Network Security vol 2008 no 7 pp 4ndash7 2008

[23] H Neil ldquoPEiDrdquo 2019 httpsgithubcomwolfram77webapp-peid

[24] O Yuschuk ldquoOllyDbgrdquo 2019 httpwwwollydbgdedownloadhtml

[25] N Waisman ldquoImmunity Debuggerrdquo 2019 httpswwwimmunityinccomproductsdebuggerindexhtml

[26] P Royal M Halpin D Dagon R Edmonds and W LeeldquoPolyUnpack Automating the hidden-code exetraction ofunpack-executing malwarerdquo in Proceedings of the 22nd An-nual Computer Security Applications Conference pp 289ndash300Miami Beach FL USA December 2006

[27] M Bat-Erdene Y Park H Li H Lee and M-S ChoildquoEntropy analysis to classify unknown packing algorithms formalware detectionrdquo International Journal of InformationSecurity vol 16 no 3 pp 227ndash248 2017

[28] S Cesare and X Yang ldquoClassification of malware usingstructured control flowrdquo in Proceedings of the 8th Austral-asian Symposium on Parallel and Distributed Computingpp 61ndash70 Brisbane Australia January 2010

[29] L Martignoni M Christodorescu and S Jha ldquoOmniUnpackfast generic and safe unpacking of malwarerdquo in Proceedingsof the 23rd Annual Computer Security Applications Confer-ence pp 431ndash441 Miami Beach FL USA December 2007

[30] M G Kang P Poosankam and H Yin ldquoRenovo a hiddencode extractor for packed executablesrdquo in Proceedings of the5th ACM Workshop on Recurring Malcode pp 46ndash53Alexandria VA USA November 2007

[31] S Mariani L Fontana F Gritti and S DrsquoAlessio ldquoPinDe-monium a DBI-based generic unpacker for windows exe-cutablesrdquo in Proceedings of the Black Hat 2016 Las Vegas NVUSA July 2016

[32] G Bonfante J Fernandez J-Y Marion B Rouxel F Sabatierand A ierry ldquoCoDisasm medium scale concatic disas-sembly of self-modifying binaries with overlapping in-structionsrdquo in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Securitypp 745ndash756 Denver CO USA October 2016

[33] M Oberhumer L Molnar and J Reiser ldquoUPXrdquo 2019 httpsupxgithubio

[34] A Swinnen and A Mesbahi ldquoOne packer to rule them allempirical identification comparison and circumvention ofcurrent antivirus detection techniquesrdquo in Proceedings of theBlack Hat 2014 Las Vegas NV USA August 2014

[35] P Bania ldquoGeneric unpacking of self-modifying aggressivepacked binary programsrdquo 2009 httpsarxivorgabs09054581

[36] H Neil ldquoBobSoft Signaturerdquo 2019 httpwoodmanncomBobSoftFilesOtherUserDBzip

[37] M Russinovich ldquoMicrosoft Sysinternals Suiterdquo 2019 httpsdownloadsysinternalscomfilesSysinternalsSuitezip

[38] R Rivest ldquoe MD5 message-digest algorithmrdquo RFC 1321IETF Fremont CA USA 1992

[39] S Tatham ldquoPuttyrdquo 2019 httpswwwchiarkgreenendorguk~sgtathamputtylatesthtml

[40] L Pascoe ldquoMD5rdquo 2019 httpwwwmd5summerorgabouthtml[41] P Ferrie ldquoAnti-unpacker tricks - part onerdquo 2019 https

wwwvirusbulletincomvirusbulletin200812anti-unpacker-tricks-part-one

[42] T-Y Wang and C-H Wu ldquoDetection of packed executablesusing support vector machinesrdquo in Proceedings of the 10thInternational Conference on Machine Learning and Cyber-netics pp 717ndash722 Guilin China July 2011

[43] D Korczynski ldquoRePEconstruct reconstructing binaries withself-modifying code and Import address table destructionrdquo inProceedings of the 11th International Conference on Maliciousand Unwanted Software pp 31ndash38 Fajardo Puerto RicoOctober 2016

16 Security and Communication Networks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 17: All-in-One Framework for Detection, Unpacking, and ...downloads.hindawi.com/journals/scn/2019/5278137.pdf · All-in-One Framework for Detection, Unpacking, and Verification for Malware

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom