hawkes intel microcode
Post on 29-Oct-2015
60 Views
Preview:
DESCRIPTION
TRANSCRIPT
-
NotesonIntelMicrocodeUpdatesBenHawkesDecember,2012March,2013
Introduction
AllmodernCPUvendorshaveahistoryofdesignandimplementationdefects,rangingfromrelativelybenignstabilityissuestopotentialsecurityvulnerabilities.ThelatestCPUerratareleaseforsecondgenerationIntelCoreprocessorsdescribesatotalof120erratums,orhardwarebugs.AlthoughmostoftheseerratabugsarelistedasNoFix,IntelhassupportedtheabilitytoapplystabilityandsecurityupdatestotheCPUintheformofmicrocodeupdatesforwelloveradecade*.
Unfortunately,themicrocodeupdateformatisundocumented.Researchersarecurrentlypreventedfromgaininganysortofdetailedunderstandingofthemicrocodeformat,whichmeansthatitisimpossibletostudytheupdatestoclearlyestablishwhetheranysecurityissuesarebeingfixedbymicrocodepatches.ThefollowingdocumentisasummaryofnotesIgatheredwhileinvestigatingtheIntelmicrocodeupdatemechanism.
*TheearliestIntelmicrocodereleaseappearstobefromJanuary29,2000.Sincethatdate,afurther29distinctmicrocodeDATfileshavebeenreleased.
Acknowledgements
TheinitialideatostudyIntelsmicrocodeupdatemechanismwasinspireddirectlyfromTavisOrmandysexploratoryworkonthissubjectin2011.Furthermore,IdliketothankEmiliaKsper,TavisOrmandy,GynvaelColdwindandThomasDullienfortheiroutstandingtechnicalassistanceandencouragement.
Howdoesthemicrocodeupdatemechanismwork?
MicrocodeupdatesareappliedtoaCPUbywritingthevirtualaddressoftheIntelsuppliedundocumentedbinaryblobtoamodelspecificregister(MSR)calledIA32_UCODE_WRITE.ThisisaprivilegedoperationthatisnormallyperformedbythesystemBIOSatboottime,butmodernoperatingsystemkernelsalsoincludesupportforapplyingmicrocodeupdates.
TheBIOS(oroperatingsystem)shouldverifythatthesuppliedupdatecorrectlymatchestherunninghardwarebeforeattemptingtheWRMSRoperation.Inordertodoso,eachmicrocodeupdatecomespackagedwithashortheadercontainingvariousupdatemetadata.TheheaderisdocumentedbyIntelinVolume3oftheDeveloper'sManual.Itcontainsthreepiecesofinformationrequiredforvalidation:themicrocoderevision,processorsignature,andprocessorflags.
Themicrocoderevisionisanincrementalversionnumberyoucanonlysuccessfullyapplyanupdateifthecurrentmicrocoderevisionislessthantherevisionsupplied.TheBIOSwilltypicallyextractthecurrentmicrocoderevisionbyissuingaRDMSRcalledIA32_UCODE_REVandthencomparethisvalueagainsttherevisioncontainedinthenewmicrocodeupdate'sheader.
Theprocessorsignatureisauniquerepresentativeofthehardwaremodelthatthemicrocodewillapplyto.ThesignatureoftherunninghardwarecanberetrievedusingtheCPUIDinstruction,andthencomparedagainstthevaluesuppliedinthemicrocodeheader.AccordingtoIntel,"eachmicrocodeupdateisdesignedspecificallyforagivenextendedfamily,extendedmodel,type,family,model,andsteppingoftheprocessor.".Theprocessorflagsfieldissimilar,Intelsays:"theBIOSusestheprocessorflagsfieldinconjunctionwiththeplatformIdbitsinMSR(17H)todeterminewhetherornotanupdateisappropriatetoloadonaprocessor."
-
OnceamicrocodeupdatehasbeenappliedusingIA32_UCODE_WRITE,theBIOSwilltypicallyissueaCPUIDinstructionandthenreadtheIA32_UCODE_REVMSRagain.Iftherevisionnumberhasincreased,theupdatewasappliedsuccessfully.
Observation#1Whatdoesamicrocodeupdatelooklike?
Since2008IntelhasregularlyreleasedDATfilescontainingthemostuptodatemicrocoderevisionsforeachprocessor.Priortothis,microcodeupdatedatawasshippedaspartoftheopensourcetoolmicrocode_ctl.AnarchiveofallmicrocodeDATreleasescanbefoundhere.
Sowhatdoestheundocumentedblobportionofthemicrocodeupdatelooklike?Itappearsthatthereisatleasttwodifferentformatstotheundocumentedblob,theoldformatbeingusedupuntilPentium4andcertainearlymodelsoftheIntelCore2,andthenewformatusedfromthatpointonwards.Thisarticlecoversthenewstyleformatonly.
ThefollowgraphicshowsamicrocodeupdateforanIntelCorei5M460(i.e.withthedocumentedmicrocodeheaderstripped):
Itisimmediatelyclearthatthereisaplaintextstructure(96bytesinlength)atthestartoftheundocumentedblob.Someeasilyidentifiablefieldsarecolorized:
Microcoderevisionnumber. Releasedate(notethatthisdateissometimesonedaypriortothemicrocodeheaderdate). Reallengthofmicrocodeupdate(countedin4bytewords). Processorsignature.
Andsomelesseasilyidentifiablefieldsthatappeartobeincommonusagearemarkedingrey:
Possibleflagsfield?Maynotbeinuseinrecenthardwaretypes. Possibleloaderversion? Possiblelengthfield(whennonzero)?Notconsistentlyused.
-
Observation#2Isthereanystructureinthemicrocodeupdateafterthe96byteheader?
Mostofthedatalocatedafterthe96byteheaderappearstoberandomandwithoutstructure.However,performingalongestcommonsubstringanalysisonanarchiveofeveryuniquemicrocodeupdate(availableinbinaryformathere)showedthatdifferentrevisionsforthesame(orsimilar)processorsignatureswillsharesomecommonbytestrings:
Inthisfigure,twodistinctstringshavebeenidentified:
Ingreen,a2048bitstringthatisconstantbetweenmicrocoderevisions.Inred,a32bitstringthatisconstantforallmicrocodeupdatesusingthenewstyleformat.
Intotal,12unique2048bitstringswerefoundtobesharedacross24processorsignatures.Theextracteddataisavailablehere(intheformat).
Notethat2048bitsisacommonlyusedlengthforanRSAmodulus,andthat0x00000011(decimal17)isacommonlyusedvalueforanRSAexponent.ThissuggeststhatthesecommonstringsmaybeanRSApublickey.Furtherevidencetosupportthisclaimisthat:
Eachofthevaluesarestrictly2048bitinlength,i.e.themostsignificantbitisalwaysset.Noneofthevaluesaretriviallyfactorableby2,i.e.thevaluesarealloddnumbered.Noneofthevaluesarefactorablebyanyvaluebetween2and2^32.
Observation#3Canthelengthofthemicrocodeupdatebeverified?
Thelengthfieldofthe96bytemicrocodeheader(shadedingreeninfig1)canbeverifiedusingafaultinjectionanalysis.Theideaistosequentiallymutateeachbyteofavalidmicrocodeupdate,attempttoapplytheupdate,andrecordwhethertheupdatewasappliedsuccessfullyornot.
TheunderlyingassumptionhereisthattheCPUshouldvalidatetheintegrityofthemicrocodeupdate,butmaynotvalidatetheintegrityofpadding(sincemicrocodeupdatesmustbeamultipleof1024,itisassumedthatpaddingisnormallyrequired).
-
TestingonanIntelCorei5M460(sig0x20655,pf0x800),theexpectedlengthofthemicrocodeupdate(inrevision3)is1668bytes(0x1a1*4).Sequentiallyflippingabitineachbytefromoffset0to2000andwaitingforthefirstsuccessfullyappliedupdategivesthefollowingresults:
ThisresultwasobservedonIntelCore2DuoP9500,IntelCorei5M460andIntelCorei52520Mchips.Forallotherexperimentsbelow,resultswerereproducedonIntelCorei5M460,IntelCorei52520M,andIntelXeonW3690chips.
Observation#4Howmanycyclesdoesanupdatetaketobeappliedsuccessfully?
TocollecttheaveragenumberofcyclestheCPUtooktosuccessfullyapplyamicrocodeupdate,aspecializedsystemwassetupthatwould:
1. Bootthesystemwithaninitialmicrocoderevision.2. InstallaLinuxkernelmodulethat:
a. Invalidatecaches(wbinvd)b. Stopinstructionprefetch(sync_core)c. Disableinterruptsfortherunningcore(local_irq_disable)d. Recordtimestampcounter(rdtsc)e. Applythenextmicrocodeupdaterevision(wrmsrMSR_IA32_UCODE_WRITE)f. Recordtimestampcounter
3. Recordtherdtscdeltainsyslog4. Reboot
Thecacheinvalidationandinterruptdisablewereintendedtoreducevarianceinthetimingdelta.Rebootingisrequiredtoresetthesystemtotheoriginalmicrocoderevision,assuccessfullyappliedrevisionsmustbestrictlyincremental.
Theexactcyclevaluewillvarysignificantlybetweendifferenttypesofhardware(olderhardwarewasobservedtotakesignificantlymorecycles),howeverabaselinevaluecanbeusedinfurthertiminganalysisonthesamehardware.Forexample,thebaselineaveragetimedeltaacross2000applicationsofmicrocoderevision3foranIntelCorei5M460is:
Average:488953cyclesSamplestandarddeviation:12270cycles
Thehighvariationinthesampledeltascollectedispresumedtobecausedbymulticoresystems.Ifthemicrocodeupdatemechanismhastoachieveaconsistentstateacrossallavailableinstructionpipelines(includingconsistencyacrosshyperthreads,prefetchedinstructions,instructioncachesonallcores),thiscouldresultinahighlevelofvariance,asthecollectionmechanismusedhereonlycleansinternalstatefortherunningcore.
-
Observation#5Dothenumberofcycleschangedependingonthelocationofafault?
Usingthebaselinetimingdeltaabove,itispossibletofinddeviationsbyflippingeverypossiblebitpositioninthemicrocodeupdateandattemptingtoapplythemalformedupdate.Alloftheseupdateattemptswillfail,buttheideaisthatcertainfieldsmaybetreateddifferentlybythemicrocodeupdatemechanism,andthatthismayshowupinthecycledelta.
RunningthistestonanIntelCorei5M460givesthefollowingresults:
Thischartshowstheresultsofthefirst1000bitpositionsbeingflipped.Threedistinctareasofinterestcanbeseen.Allotherbitpositionabove1000returnacyclecountmatchingthefailurecaseseenabove.
Thefirstareaofinterest,betweenbitoffsets32and63,correspondstoanunknownwordtheinthe96byteheaderthatalwayshasvalue0x000000a1.Thismayserveasamagicvalue,checkedwhenthemicrocodeisfirstloadedtoensurethatanexpectedformathasbeenreceived.
Thesecondareaofinterestisasinglebitatoffset64,whichappearstocorrespondtoaflagsfield.Intheoriginalanalysis,thisbitwasset.However,clearingthebitandrepeatingtheanalysisshowsidenticalresultstofigure4,exceptwithasignificantlyloweraveragecountofcyclesforthenormalfailurecase.Thedecreaseincyclecountappearstobeproportionaltothenumberofphysicalcoresonthesystem,whichmaysuggestthisbitisusedtodecidewhethertheupdatewillbeiterativelyappliedtoallcores,oronlyappliedonasinglecore.
Thethirdareaofinterest,betweenbitoffsets233and253,correspondstothemicrocodesizefield.
-
Observation#6Whathappensofthemicrocodesizefieldismodified?
Modifyingeachbitpositionresultsinanincrementallyhighercyclecount.Toinvestigatethisfurther,asecondanalysiswasrunthatrecordsthecyclecountforeachsizevaluebetween0and10000.ThefollowingshowstheresultsofthisanalysisonanIntelCorei52520M:
Inthischartwecanseeaclearcorrelationbetweenanincreasingsizevalueandanincreasingcyclecount.Thischartappearstoalsoshowartifactsfromrunningthissystemonamulticoresystem(notethatthei52520Misaquadcoreprocessor,andthatfourmaintrendlinescanbeseen).
Runningthesizemodificationanalysiswithanincorrectmagicvalue(i.e.replacing0x000000a1withadifferentvalue)resultsinaflatchartwithnocorrelationbetweenvalueandcyclecount.Thissuggeststhatthemagicvalueischeckedpriortothesizevaluebeingused.
-
Duetothehighlevelofnoisewhilerunningthisanalysisonamulticoresystem,theanalysiswasrerunwithsymmetricmultiprocessing(SMP)andHyperThreadingdisabled.Aclearlinearcorrelationbetweenlengthvalueandcyclecountisseen.ThefollowdataistakenfromanIntelCorei52520M:
-
Withthiscleanerdata,itispossibletoobservenewtimingbehavior.Bydisplayingasmallersample,cleartimingshelvesareseenasthesizevalueincreases:
-
Byobservingtheindividualpointsofthetimingshelves,itisclearthateachtimingshelfhas16points.Sinceeachsingleincreaseinsizevaluecorrespondstoa4byteincreaseinmicrocodedata,16pointsrepresents512bitsofdata.
512bitsisthestandardmessageblocksizeforpopularcryptographichashfunctionssuchasMD5,SHA1andSHA2.ThetimingshelvesobservedmatchwhatwewouldexpectfromaMerkleDamgrdhashfunction,aseachnewshelfrepresentstheincreasenumberofcyclesrequiredtoprocessanewmessageblock.
Inpublickeysignatureschemes,itisnormaltosignahashofthedatamessageinsteadofsigningtheentiremessagecontents.Thismeansthatahashoperationbeingobservedintheearlystagesofthemicrocodeloaderprocessisanexpectedresult.
Thelackoftimingartifactscorrespondingtosymmetrickeyalgorithmblocksizes(i.e.128bits)mayalsoindicatethatauthenticationofthemicrocodecontentsisoccurringpriortodecryptionofthemicrocodecontents(i.e.theciphertextisauthenticated).GiventhespaceconstraintsofamodernCPUarchitecture,thisdesignisnotentirelyunexpected,asitallowstheprocessortoloadthedecryptedcontentdirectly,withouthavingtostoretheplaintextforauthenticationpurposes.
Observation#7Whatotherdataisinthefirst706bytesofamicrocodeupdate?
Notethatthefirstshelfisobservedaftersupplyingasizevalueof176(or704bytesofmicrodedata),andthatsupplyingasizevalueof706bytesorlessresultsinaconstanttimingshelf.Thiswouldsuggestthatthereexistsaminimumlengthofnonvariablelengthdatathatwillbehashedregardlessofthesuppliedmicrocodesizefield.ThisdataincludestheundocumentedmicrocodeheaderandtheRSApublickeythathasbeendiscussedabove.
-
IfweassumethatthepresenceofanRSApublickeysuggeststheusageofRSAasadigitalsignaturealgorithm,thenitstandstoreasonthatanRSAsignaturewillbefoundinthemicrocodeupdate.Ifthissignaturevalueiscalculatedusingthepublickeyembeddedinthemicrocodeupdate,thenwewouldexpecttofinda2048bitvaluethatisstrictlylessthanthemodulusvalue(sincethesignatureiscalculatedusingthismodulus).
Examiningthe2048bitsthatarecontiguouslyafterthepublickeyexponentvalue(0x00000011),wefindavalidcandidateforanRSAsignature.Ineverycase,the2048bitvalueaftertheexponentisstrictlylessthanthe2048bitspriortotheexponent(thepresumedRSAmodulus).
Wecanattempttorecovertheoriginallysigneddatabyraisingthesignaturevaluetothepowerof0x00000011andthenusingthemodulusvalue.Theresultsofthisoperationcanbefoundhere.Theformatofthisfileis.
TheresultappearstousePKCS#1v1.5padding,withaprivatekeyoperationsetfortheblocktype.Itisalsoclearthatearlierprocessormodelsuseda160bitdigestforthesignaturehash,whichisconsistentwithSHA1.Laterprocessormodelsusea256bitdigest,whichisconsistentwithSHA2.
AllattemptsatrecreatingthesehashvaluesusingstandardSHAimplementationshavefailed.SeveralnonstandardvariationsofMerkleDamgrdstrengtheningwerealsoattempted.Thismayindicatethatanonstandardinitialvectororsomeothernonstandardstructuralvariationisusedwhencalculatingthesignedhashvalue.
AttemptstoinsertanewpublickeyandsignatureforthesamePKCS#1signeddataintothemicrocodealsofailed,whichsuggeststhatthepublickeyispartoftheauthenticateddata,orthatahashoftheexpected/officialpublickeyisstoredinfactoryembeddedmemoryandverifiedafterauthentication.
Interestingly,itwasobservedthatsettingthemostsignificantwordofthepublickeymodulustozeroresultsinahardwarereset(inthecaseofasinglecoresystem,thismanifestsasahardwarehalt/freeze,notasystemrestart).Thismaysuggestadivisionbyzeroerrorexistsinthemicrocodeauthenticationroutine.
-
Conclusion
StudyingtheIntelmicrocodeupdatemechanismthroughdataanalysisandtiminganalysishasrevealedpropertiesaboutthecryptographicdesignofthissystem:
Severalpreviouslyundocumentedheaderfieldshavebeenidentifiedanddescribed. Theresultssuggestthatmicrocodeupdatesareauthenticatedusinga2048bitRSAsignature. TheRSAsignatureoperationappearstobeconstanttime(i.e.unaffectedbychangestothesuppliedexponent,
modulusorsignaturevalue). Timinganalysisreveals512bitstepscorrelatingtosuppliedmicrocodelength.Thisisacommonmessage
blocksizeforcryptographichashfunctionssuchasSHA1andSHA2 TheRSAsignaturewaslocated,andthesigneddataisaPKCS#11.5encodedhashvalue.Olderprocessor
modelsusea160bitdigest(SHA1),andnewerprocessmodelsusea256bitdigest(SHA2).
top related