taint analysis - nanjing universityseclab.nju.edu.cn/lecture/taintanalysis.pdfdynamic taint analysis...
TRANSCRIPT
Taint Analysis
Contents
2
• Pin ToolØ IntroductionØ Intel PIN CapabilityØ How to instrumentationØ How to Pass ParametersØ Instrumentation granularity
• Dynamic Taint AnalysisØ Classify of taint analysisØ Basic ConceptØ IntroductionØ Byte or bit Ø Shadow MemoryØ Dynamic Taint Analysis
Pin tools
3
Instrumentation
• Atechniquethatinsertscodeintoaprogramtocollectrun-timeinformationorchangeitsbehavior
4
Different Instrumentations• Source-CodeInstrumentation
Ø CompilerPluginü Insertcodewherecompilethesourcetobinaryü Highefficient
• StaticBinaryInstrumentationØ Binaryrewriter
ü Disassemblingandrecompileü Difficulttoensurecorrectness
• DynamicBinaryInstrumentationØ DynamicBinaryInstrumentationTool
ü Instrumentcodejustbeforeitrunsü Noneedtorecompileorre-linkü Analyzeandmodifycodeatruntime
5
Dynamic Binary Instrumentation• Intel PIN• Valgrind• QEMU
6
Intel Pin Capability• Binary Analysis:
Ø TraceControlFloworDataFlowØ Hookfunction,signalsandsystemcallØ Multi-Threadsupport
• Change program behavior:Ø Add/deleteinstructions/basicblocks/functionsØ ChangeregistervaluesØ ChangecontrolflowØ Changememoryvalues
7
Starting at first application IP Read a Trace from Application CodeJit it, adding instrumentation code from inscount.dllEncode the trace into the Code CacheExecute Jitted code
Execution of Trace endsCall into PINVM.DLL to Jit next tracePass in app IP of Trace’s target
Source Trace exit branch is modified to directly branch to Destination Trace
Pin Work Flow Demonstrationgzip.exe input.txt
Application Code andData
Application Process
System Call Dispatcher
Event Dispatcher Thread Dispatcher
PINVM.DLL
inscount.dll
PIN.LIB
Code Cache
NTDLL.DLL
Windows kernel
CreateProcess (gzip.exe, input.txt, suspended)
Launcher
PIN.EXE
Launcher Process
Boot Routine +Data:firstAppIp,“Inscount.dll”
Load PINVM.DLL
Inject Pin BootRoutine and Data into application
Load inscount.dll and run its main()
Start PINVM.DLL running(firstAppIp, “inscount.dll”)
pin.exe –t inscount.dll – gzip.exe input.txtCount 258743109
PinTool that counts application instructions executed, prints Count at end
Resume at BootRoutine
First app IP
app Ip of Trace’s target
Read a Trace from Application CodeJit it, adding instrumentation code from inscount.dllEncode the jitted trace into the Code Cache
GetContext(&firstAppIp)SetContext(BootRoutineIp)WriteProcessMemory(BootRoutine, BootData)
Decoder
Encoder
How to instrumentation
9
Insertcallback function forinstructions,basicblocks,functionsandimage.e.g.,Instruction Instrumentation
How to instrumentation
10
How to Pass Parameters
11
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)ifun, IARG_TYPE, IARG, …… IARG_END);
IARG_TYPE:Ø IARG_ADDRINTØ IARG_PTRØ IARG_BOOLØ IARG_UINT32Ø IARG_UINT64Ø IARG_INST_PTRØ IARG_REG_VALUEØ IARG_REG_REFERENCEØ IARG_REG_CONST_REFERENCEØ ……
IARG:Ø INS_Address(ins)Ø INS_OperandReg(ins, 0)Ø INS_MemoryOperandCount(
ins)Ø INS_Valid(ins)Ø ……
Instrumentation Granularity:• Instructioninstrumentation• Basicblockinstrumentation
Ø Asequenceofinstructionsterminatedatacontrol-flowchanginginstructionØ Singleentry,singleexit
• TraceinstrumentationØ Asequenceofbasicblocksterminatedatanunconditionalcontrol-flowchanginginstruction
Ø Singleentry,multipleexits• Routineinstrumentation• Imageinstrumentation
APIs:
12
Compare with Trace and Basic Block
13
Taint Analysis
18
• Classify of taint analysis• Basic Concept• Introduction• Byte or bit • Shadow Memory• Dynamic Taint Analysis
Classify Of taint Analysis
• StaticTaintAnalysisØ Theadvantageofusingstaticanalysisisthefactthatitprovidesbettercodecoveragethandynamicanalysis.
Ø Ontheotherhand,theprincipaldisadvantageofthestaticanalysisisthatit'snotasaccuratethanthedynamicanalysis- Itcannotaccesstheruntimeinformationforexample.Wecan'tretrieveregistersormemoryvalues.
• DynamicTaintAnalysisØ Dynamicanalysiswecan'tcoverallthecodebutyouwillbemorereliable.
19
Basic concept
20
Taintpropagation:Ø TaintIfanoperationusesthevalueofsometainted object,sayX,toderiveavalueforanother,sayY,thenobjectYbecomestainted.ObjectXtaintedtheobjectY
Taint propagation
21
Basic concept
22
• TaintSources: program,ormemorylocations,wheredataofinterestenterthesystemandsubsequentlygettagged.Fortheconvenienceofdescription,weusetheuserinputasthetaintsourceinthiscourse.
• TaintTracking: processofpropagatingdatatagsaccordingtoprogramsemantics
• TaintSinks: program,ormemorylocations,wherechecksfortaggeddatacanbemade
Introduction
23
Taintanalysisisusedtoknowataprogrampointwhatpartofmemoryorregisterarecontrollablebythesomedataweareinterested,forexample:userinput.
Accordingtotheinstructionsemanticsthetaintisspreadovertheexecution.
Introduction
24
Forexampleseethefollowingcode.
Intheexample1,atthebeginning,the'a'and'b'variablesarenottainted.Whentheatoifunctioniscalledthe'a'variableistainted.Then'b'istaintedwhenassignedbythe'a'value.Nowweknowthatthefoo2functionargumentcanbecontrolledbytheuser.
Introduction
25
Intheexample2,whenthebufferisallocatedviamallocthecontentisnottainted.Thenwhentheallocatedareaisinitiazliedbyuserinputs,weneedtotaintthebytes'buffer+2','buffer+12'and'buffer+30'.Later,whenoneofthosebytesisread,weknowitcanbecontrolledbytheuser.
Byte or bit ?
26
Oneoftheseproblems istodeterminewhatmethodisthemoreaccuratetodoataintwithagreatprecision.Forexample,whatarewesupposedtodowhenacontrolledbyteismultipliedandstoredsomewhereinmemory?Shouldwetaintthedestinationvariable?Seethefollowingcode.
call atoi@pltmov eax,edxcmp eax,$0jse nextcmp eax,$4jne nextshl eax,0x3sub eax,edxmov eax,DWORDPTR[rbp-0x4]next:mov DWORDPTR[rbp-0x4],eaxleaveret
Byte or bit ?
27
Inthepreviouscode,wecancontrolonly5bitsofthevariable'num';notthewholeinteger.So,wecan'tsaythatwecontrolthetotalityofthisvariablewhenitisreturnedandusedsomewhereelse.
Byte or bit ?
28
Bytetaintanalysisassertbistainted.Bittaintanalysisassertbis
nottainted.
Byte or bit ?
29
So,whattodo?Taintingbytesiseasierandlightortaintingbitscontrolledbytheuser?Ifyoutaintbytes,itwillbeeasierbutnot reliable.Ifwetaintbits,itwillbeharderandmoredifficultto
managethetainttreebutitwillbe99%reliable.
Taintbytesisenoughformostsituation.
Dynamic Taint Analysis
30
Howtodothedynamictaintanalysis?
Dynamic Taint Analysis
31
Inordertodothis,weneedadynamicbinaryinstrumentation(DBI)framework.ThepurposeoftheDBIistoaddapre/posthandleroneach
instruction.Whenahandleriscalled,youareabletoretrievealltheinformationyouwantabouttheinstructionortheenvironment(memory).
WechoosetousePin:aC++dynamicbinaryinstrumentationframework(withoutIR)writtenbyIntel.
32
Weusershadowmemorytomarkalladdresscanbetaintedbyoriginatedataweinterested.
Shadow Memory
Shadow Memory
33
• ShadowMemory: Shadowmemorydescribesacomputersciencetechniqueinwhichpotentiallyeverybyteusedbyaprogramduringitsexecutionhasashadowbyteorbytes.
• Theseshadowbytesaretypicallyinvisibletotheoriginalprogramand areusedtorecordinformationabouttheoriginalpieceofdata.
Shadow Memory
34
• ShadowMemoryØ Weneedamapping
ü Addr →AbstractStateü Register→Abstract
Shadow Memory
35
• ShadowMemoryØ Weneedamapping
ü Addr →AbstractStateü Register→Abstract
Shadow Memory
36
• ShadowMemoryØ Weneedamapping
ü Addr →AbstractStateü Register→Abstract
Shadow Memory
37
• ShadowMemoryØ Weneedamapping
ü Addr →AbstractStateü Register→Abstract
Shadow Memory
38
• ShadowMemoryØ Weneedamapping
ü Addr →AbstractStateü Register→Abstract
Dynamic taint Analysis
39
Firstlyweneedtodeterminatealluserinputslikeenvironment andsyscalls.Webegintotainttheseinputsandwespread/removethetaintwhenwehaveinstructionslikeGET/PUT,LOAD/STORE.
Dynamic Taint Analysis
40
• Forthisfirstexample,wearegoingtotaintthe'read'memoryareaandwewillseeabriefoverviewofthePinAPI.Forthisfirsttestwewill:Ø Catchthesys_read syscall.Ø Getthesecondandthethirdargumentfortaintarea.Ø CallanhandlerwhenwehaveaninstructionlikeLOADorSTOREinthisarea.
Ø Spreadthetaint.
Catch the syscalls
41
Whenasyscall occurs,wewillcheckifthesyscall isread.Then,wesavethesecondandthirdargumentwhichdescribeourmemoryarea.Thesecondargumentisthestartofmemoryaddresswhich
thesyscall iswritingto.Thethirdargumentisthelengthofdatatowritetothe
memeory.
Catch the syscalls
42
Catch the LOAD and STORE instructions
43
Nowweneedtocatchallinstructionsthatread(LOAD)orwrite(STORE)inthetaintedarea.Todothat,wewilladdafunctioncalledeachtimeanaccesstothisareaismade.
Catch the LOAD and STORE instructions
44
Hook Load Instruction
45
Hook Store Instruction
46
Spread the taint
47
ImagineyouLOADavalueinaregisterfromthetaintedmemory,thenyouSTOREthisregisterinanothermemorylocation.Inthiscase,weneedtotainttheregisterandthenewmemorylocation.Sameway,ifaconstantisSTOREDinthememoryarea
tainted,weneedtodeletethetaintbecausetheusercan'tcontrolthismemorylocationanymore.
Spread the taint
48
ImagineyouLOADavalueinaregisterfromthetaintedmemory,thenyouSTOREthisregisterinanothermemorylocation.Inthiscase,weneedtotainttheregisterandthenewmemorylocation.Sameway,ifaconstantisSTOREDinthememoryarea
tainted,weneedtodeletethetaintbecausetheusercan'tcontrolthismemorylocationanymore.
Spread the taint
49
Spread the taint
50
Taint analysis for security
• DetectOverflow-Return-Address
51
Detect overflow-Return-Value
• Howtocheckifthereturnaddressisoverflowed?
• Howtogettheespvaluepointedtoreturnaddress?
Check the return address
pop %espesp
Beforeeveryreturn
if %esptainted
ret addroverflowed
Get esp value
getcpu context getesp valueinstrument checktheespvalue
Example
Reference:PIN introduce[1]https://software.intel.com/sites/landingpage/pintool/docs/71313/Pin/html/index.html
PIN�API���[2]https://software.intel.com/sites/landingpage/pintool/docs/71313/Pin/html/group__PIN__SYSCALL__API.html
PINtool���[3]https://software.intel.com/en-us/articles/pin-a-binary-instrumentation-tool-downloads
[4]FreeSentry:ProtectingAgainstUse-After-FreeVulnerabilitiesDuetoDanglingPointers