tool-based approach for fi nding complex concurrency issues ...€¦ · insanely difficult to...
TRANSCRIPT
DevelopingSoftwareinaMulticore&MultiprocessorWorldTool-based approach for fi nding complex concurrency issuesand endian incompatibilities
Tokeeppacewithcustomerdemandsformorefunctionalityandspeed,
softwareteamsaremovingawayfromsingleprocessorarchitectures
atarapidrate.Inparticular,embeddeddevicesthatusedtohave
onechiptoperformaconstrainedsetoftasksarenowworkingin
heterogeneousprocessorenvironmentswhereprocessorsareusedfor
networkconnectivity,multi-media,andawholevarietyofrequirements.
AccordingtonewdatafromVDCResearch,thistrendisonlyexpected
toaccelerate:engineersexpectthatintwoyearstime,thenumberof
singleprocessorprojectswilldropbyhalf.
Thebusinessimpactofthisgrowingcomplexityisstark:multicoreandmultiprocessorsoftwareprojectsare4.5Xmoreexpensive,have25%longerschedules,andrequirealmost3Xasmanysoftwareengineers.1
Oneareainparticularwherethisgrowingcomplexitycanhaveadramaticimpactoncostandscheduleoverrunsisintheareaofsoftwaretestingandcodeinspection.Amulticore/processorenvironmentcanaddexponentialcomplexitytoeffectivelyidentifyingerrorsinsoftware.Therearetwoareasinparticularthathavetheabilitytodragtheproductivityofasoftwareteamthroughthefloor:concurrencyerrorsandendianincompatibilities.
Thiswhitepaperwilldiscussthesetypesofissuesindetail,explainhowKlocwork’ssourcecodeanalysisengine,KlocworkTruepath™canbeusedtoaddressthem,andwalkthroughtwoexamplesoftheseproblemsinprominentopensourceprojects.
GWYNFISHER,CTOWHITEPAPER | SEPTEMBER2010
WWW.KLOCWORK.COM
1VDCResearch,“NextGenerationEmbeddedHardwareArchitectures:DrivingOnsetofProjectDelays,Costs
Overruns,andSoftwareDevelopmentChallenges”,September2010.
Figure 1 | Processing Architecture Used in the Current Project and Expected in Next Two Years (Percent of Respondents)
Current Project
Don’t know2.9%
Don’t know8.5%
Multicore andmultiprocessor
5.2%
Multicore andmultiprocessor
19.4%
Multicore9.3%
Multicore21.4%
Multi-processor
20.8% Multi-processor
20.6%
Singleprocessor
61.8%
Singleprocessor
30.1%
Expected in 2 Years
Developing Software in a Multicore & Multprocessor World | Klocwork White Paper | 2
TacklingConcurrencyIssuesandEndianIncompatibilitieswithKlocworkTruepath™______
Concurrency IssuesSourcecodeanalysisisaprocessbywhichtheexpected,orpredictedbehaviorofaprogramatruntimeisexercised,alongeveryconceivablecontrolflowpath,inorderthataberrantsituationsbefound,diagnosed,anddescribedtotheauthorinsuchawayastomakethemsimpletofix.Inthetypicalcourseofevents,notiminginformationororder,otherthanthatinherentinthecontrolflowgraph,isinterpretedorrequiredforthisanalysistotakeplace.
Concurrencyissuesposeacomplexsetofchallengesforanalysis,astheydorequiretimingororderinginformationtobepromotedintothecontrolflowgraph.Someareobviouslylessdifficulttofindthanothers,suchasthreadsthatreservelocksandperformtime-consumingactivitiesbeforereleasing.Thistypeofbehavior,whilstnotleadingtoacriticalfailuresuchasadeadlock,canleadtofrustrationonthepartoftheenduserofthesoftware,forexampleinthefaceofanunresponsivedevice.
Themorecomplextypeofconcurrencyissues,suchasdeadlocks,requireanadditionaltypeofanalysisover-and-abovethatperformedwhenfindingnon-order-relatedbugssuchasmemoryleaksorbufferoverruns.Inthiscase,wemustperformtwodifferenttypesofanalysis:onethatgathersandpropagateslocklifecyclebehavior,andanotherthatcananalyzethewholeprogramspaceandfindconflictsinthisbehavior.
KlocworkTruepathmakesthispossibleviatheadditionofanewconcurrencyanalysisenginetoitsexistingtoolchain:
Inthisfigureyoucanseethatdatarelatingtolocklifecyclesisgatheredbythenormalanalysisengine,andoncethishasbeenproducedforallmodulesinthesystem,thewholeprogramspaceisthenanalyzedbythenewconcurrencyanalysisenginesothatloopsinthelifecyclegraphcanbefound,whichequatetodeadlocks.
Considerafunctionthatoperatesasfollows:
Figure 2 | Klocwork Truepath tool chain provides concurrency analysis engine after control flow graph analysis and build emulation.
lock_t Lock1, Lock2;
void foo(int x) { if( x & 1 ) { lock(Lock1); lock(Lock2); } else lock(Lock1);}
Compile
• Emulate native build
• Build control flow graph
Symbolic logic
• Analyze control flow graph
• Perform dataflow analysis
Concurrency
• Analyze lock dependencies
Developing Software in a Multicore & Multprocessor World | Klocwork White Paper | 3
Youcaneasilyseebyinspectionthatwhenpassedanoddnumberasitsparameter,thisfunctiondefinesadependencyofLock2uponLock1.Failinganoddparameter,Lock1isstillreserved,butthistimethereisnodependencyofLock2uponLock1atthelocalscope,althoughtheremaystillremainthatdependency(oranother)ataninter-proceduralscope.
Therefore,wehavetwodiscretetypesofquestionstoaskwhenperformingtheanalysis:
1. Symboliclogicquestions:a. Isthereavalidcontrolflowthatgetsustocallfunctionfoo()with
anoddparameter?b. Isthereavalidcontrolflowthatresultsinfoo()beingcalledwith
anevenparameterfollowedbyacalltoanotherfunctionthatresultsinanotherlock(e.g.Lock2)beingreservedbeforeLock1isreleased?
2. Lockdependencyquestions:a. Ifeitheroftheseareso,isthereanyothersituationinthe
program’snaturalcontrolflowwherebyacounter-dependencyofLock1uponLock2canbereached,potentiallyresultinginadeadlock?
ThefirsttypeofquestionisansweredbyKlocworkTruepath’ssymboliclogicengineduringthenormalcourseofprogramanalysis,justasanyothertypeofdefectisanalyzedforinter-proceduraldataflowsthatcanorcannotoccur.
Thesecondtypeofquestionisthenansweredbytheconcurrencyanalysisengine,fedbythecollectionofallpossibledependencieswithintheprogramspace.Theresultiswhattendstobeasmallsetofincrediblydifficulttofind(manually),andinsanelydifficulttounderstand(withoutatool)deadlockscenariosthatdeveloperscantriageandfixveryquicklywithinthenaturalcourseoftheirimplementationtasks.
Endian IncompatibilitiesWhilstitmaybetruethatthereare10kindsofpeopleintheworld,aswitchfromalittleendianplatformtoabigendianplatformwillmuddythatimpressionconsiderably.Anadvisorofoursrecentlyinformedmewithgleethathe’dfinally“sethisMSB”(havingpassedhis64thbirthday),butstorethatinnibblerepresentationonanunexpectedendianarchitectureandhe’dberegressingtothenurseryoncemore.
Inshort,endianrepresentationsaffecthowthehostprocessorstoresintegraltypesinmemory.Considering32-bitintegers,eachofwhichconsistsoffourbytesofmemory,theprocessorcanchosetoreadandwritethosefourbytesinavarietyoforders,althoughtraditionallyonlytwoareused:
»» Littleendian,inwhichthebytesarewrittenintheorder0,1,2,3»» Bigendian,inwhichthebytesarewrittenintheorder3,2,1,0
Thispicturebecomesslightlymuddiediftheprocessoractuallywriteswordsatatime(thisismostlyafairlyhistoricalrepresentationnow,butwementionitforcompleteness),andappliesitsendianassumptionstoeachword:
»» Littleendianstillwritesbytesintheorder0,1,2,3»» Bigendian,however,maynowwritebytesintheorder1,0,3,2
Howevertheprocessorstoresandreadssuchtypesisentirelyatitsowndiscretionandthebusinessofnobodyelse.Until,thatis,thedeveloperdirectstheprocessortowritesuchdataintoamediumfortransmission,asopposedtostorageinmemory.
Transmissionmedia,whichcouldbesockets,files,pipes,oranyotherinter-processorvector(e.g.interruptsthatcausedatatobewrittentothePCI-Expressinterface,ortotheserialbus,or…),areaddressedbytheprocessorinexactlythesamewayasmemoryunlessspecificallytoldtodootherwise.
Developing Software in a Multicore & Multprocessor World | Klocwork White Paper | 4
Thus,abigendianprocessorwillwritea32-bitintegerontoasocketinbyteorder3,2,1,0.IftheCPUontheotherendofthesocketusesalittleendianarchitecture,thenobviouslyavaluewrittenontothesocketwillbeinterpretedcompletelydifferentlywhenread.Forexample,avalueof29,writtenbyabigendianprocessorandreadbyalittleendianprocessorwillbeinterpretedas53,504–notasmallcorrectionbyanymeans.
Preparingaprogramforusewithheterogeneousprocessorarchitecturesthereforeinvolvesfindingeveryintegraltypethateverhitsatransmissionvectorthatcouldlegitimatelytargetanotherprocessorandensuringthattheread/writeoperationinvolvedtransformsthedatainto/fromaneutralrepresentationthatbothsidesagreeon.Inaprogramofanysizeatall,obviouslythisisanon-trivialtask.
KlocworkTruepathcanhelpdevelopersinthistaskasitnowincludestheabilitytovalidatetyperepresentationusagesymmetricallyasthosetypescrosstransmissionvectorboundaries.Thatis,thedataflowenginewithinKlocworkTruepathautomaticallyvalidatesthattypesthatarewrittendirectlytoatransmissionvectoraresubjecttohost-to-neutralformattransformationbeforethewriteoperationtakesplace.Likewise,integraltypesreadfromatransmissionvectoraretrackedtoensurethattheyareappropriatelytransformedpriortothefirstattemptedusageonthehost.
Forexample,considerthefollowingfunction:
Thissimplefunctionmakesthebasicassumptionthatthereaderontheotherendofitssockethasthesameprocessorarchitectureasthesender.Thismightbetrue,ormoreaccuratelyitmightbetruetoday,butwhatdesignercaneverlookfarenoughintothefuturetoknowthatitwillalwaysbetrue,regardlessofmarketshifts,greatideasthatmarketinginternshave,etc.
KlocworkTruepath,uponanalysisofthisfunction,willpointout:
Value ‘x’ is used in host byte order, but should be used in environment/network byte order.
Adeveloperversedininter-architecturaldevelopmentwillnaturallymodifythisfunctiontotransformthevalueofthevariable‘x’priortotransmission:
Likewisewhenitcomestoreadinginformationacrossatransmissionvector,KlocworkTruepathtracesthedataflowofanyreceivedintegraltypestoensure,inexactlytheoppositewaytosending,thatanysuchvaluesaretransformedtohostformatpriortotheirfirstusage.
void foo(int sock){ int x;
for( x = 0; x < 256; x++ ) if( send(sock, &x, sizeof int) < sizeof int) return;}
void foo(int sock){ int x, xt;
for( x = 0; x < 256; x++ ) { xt = htonl(x); // … or some other suitable form if( send(sock, &xt, sizeof int) < sizeof int) return; }}
Developing Software in a Multicore & Multprocessor World | Klocwork White Paper | 5
OpenSourceCaseStudies________________________________________________________________________________________
Lock Contention: SQLite ca. 2006Longaddressedbythedevelopersofthisgreatopensourceproject,adeadlockwasreportedintheexecutionofthedatabaseengineandwastracedtocodethatwasspecificallyintendedtoguardagainstsuchanoccurrence(asisusuallythecase).Althoughcomplicatedtounderstand,andcertainlytheeventualfixresultedinanalmosttotalrewriteoftheoffendingmodule,requiringdaysorperhapsweeksofintensemanualdebuggingandthought-modelingwithoutatoolsuchasKlocworkInsight,thisverynastybugwasfoundandcorrectlydescribedbyKlocworkTruepathduringananalysisthattookmereminutes.
Considertherequirementtoimplementasimplisticsingletonrecursivelockcapabilitywithinanenvironmentthatdoesn’tsupportsuchconstructs.Usingreferencecounting,wecanquitesimplyguardtheunderlyingnon-recursivelockandmanageitslifecycleappropriately.Ofcourse,thisbeingaparallelworld,weneedtouseanotherlocktoguardthereferencecountthatwe’reusingtoguardthereallock,makingtheimplementationjustabitmorecomplicated.
Thedesignofthismightlooksomethinglikethefollowingexample:
NowIcancallenter()multipletimes,simulatingsomeofthecapabilitiesofatruerecursivelock,andaslongasIremembertocallleave()anequalnumberoftimesthelifecycleoftheunderlyingnon-recursivelockismanagedcorrectly:
lock_t lock1, lock2;int refCount = 0;
void enter() { reserve_lock(lock1); if( refCount == 0 ) reserve_lock(lock2); release_lock(lock1); refCount++;}
void leave(){ reserve_lock(lock1); refCount--; if( refCount == 0 ) release_lock(lock2); release_lock(lock1);}
void foo(){ // real lock is reserved enter(); if( i-really-want-to ) { // only the reference count is affected enter(); leave(); } // now the real lock is released leave();}
Developing Software in a Multicore & Multprocessor World | Klocwork White Paper | 6
Nowconsidertherequirementtoimplementanabstractionoverthread-specificdatastorage.Toensuresafetywhenallocatingsuchastructure,thedatabaseengineusesthesingletonrecursivelockdescribedabovetoprotectitsactivitieswithanimplementationthatsimplifiesasfollows:
Tosimpleinspection,thisappearsquitecorrectasitcallsleave()thesamenumberoftimesasenter()andthusshouldbeconsideredwellbehaved.Unfortunatelylifeintheparallelworldisrarelysimpletoanalyze,andthiscaseiscertainlymorecomplicatedthanitfirstappears.
ConsideratwocoreCPUexecutingtwothreads,bothcallingcreate_dataatveryslightoffsetsintime.
Thefirstthread—let’scallourthreadsThread1andThread2—beginsexecutingcreate_data()andsuccessfullycallstheenter()function.Thisresultsintheunderlyinglock,lock2,beingreservedtoThread1:
Nowlet’sassumethatThread2beginsitsexecutionofcreate_data()duringthetimethatThread1isactive,andbeforeitreleaseslock1:
Onefurtherassumptionmakesthescenariowhole:Thread1atthismomentisinterruptedbytheoperatingsystem,losingitstimeonchip.Crucially,thishappensbeforethereferencecountisupdated.(Checktheimplementationofenter()andyou’llseethattheauthorunfortunatelyleftthereferencecountupdateoutsideofwhatissupposedtoguardaccesstoit.)AsthereferencecountwillthereforestillreadzeroforThread2,itwillattempttoreservelock2,resultinginThread2blocking(aslock2isalreadyownedbyThread1):
int tlsCreated = 0;
data_t* create_data(){ static data_t* tls;
enter(); if( tlsCreated == 0 ) tls = create_thread_data(); tlsCreated = 1; leave(); init_data(tls); return tls;}
Thread 1create_data() enter() refCount = 0 reserve(lock1) reserve(lock2) release(lock1) refCount = 1
Thread 1 Thread 2create_data() enter() refCount = 0 reserve(lock1) reserve(lock2) create_data() enter() release(lock1) reserve(lock1)
Developing Software in a Multicore & Multprocessor World | Klocwork White Paper | 7
Uponreturnfrominterrupt,Thread1isreleasedandresumesexecutionwhereitleftoff,incrementingthereferencecountandreturningfromtheenter()function.Itsexecutionofcreate_data()continues,leadingtoacalltotheleave()function,whichunfortunatelyattemptstoreservelock1beforedoinganythingelse:
DuetothefactthatThread2iscurrentlyblocked,waitingonlock2,andcurrentlyownslock1,Thread1willnowblockonitsownattempttoreservelock1.
Inshort,thisisaclassiclock-orderinversioncontentioncausedbyapoorlyguardeddataitem,whichwhensubjecttoracecondition(beingreadbyonethreadwhilstintheprocessofbeingupdatedbyanother)causesonethreadtoreservelocksinorderwhiletheotherthreadattemptstoreservethemoutoforder,resultinginadeadlock.
Withtheraceconditionfixed,thissingletonwilloperatecorrectly,althoughaspreviouslydescribedtheauthoractuallychosetocompletelyrewritethismodule,providingamoreusefulre-entrantmutualexclusioncapabilityformultiplethreads,i.e.removingthesingletonsemantic.
Thread 1 Thread 2create_data() enter() refCount = 0 reserve(lock1) reserve(lock2) create_data() enter() release(lock1) reserve(lock1) interrupted refCount = 0 reserve(lock2) blocked
Thread 1 Thread 2create_data() enter() refCount = 0 reserve(lock1) reserve(lock2) create_data() enter() release(lock1) reserve(lock1) interrupted refCount = 0 reserve(lock2) blocked refCount = 1 return leave() reserve(lock1); blocked
Developing Software in a Multicore & Multprocessor World | Klocwork White Paper | 8
Figure 4 | Control flow description from Klocwork Truepath
Figure 3 | Source listing from SQLite
Developing Software in a Multicore & Multprocessor World | Klocwork White Paper | 9
Endian Design Assumptions: PostgreSQLIncontrasttothesituationdescribedinrelationtoSQLite,thefindingsinthiscasestudydon’tpointtobugsinsoftwareasmuchastheydotolimiteddesigndecisionsandtheimpacttheyhaveonhowsoftwareisthenconstructed.
Specifically,whendesigningamulti-processapplication,thearchitectisfacedwiththefundamentaldecisionofwhetherallofthoseprocessesaregoingtobesupportedononechip,orwhetherforthesakeofscaleorpureflexibility,thesoftwarewillsupportbeingdeployedandexecutedonmultiplechips/hosts/devicesatonce.
InthecaseofPostgreSQL,oneoftheprocessesdetachedfromthemainkernelisthestatisticscollector,somethingthatactsmoreorlessasaperformancemonitor,allowingtheDBAtounderstandwhat’sgoingonwithinthekernel,withoutnecessarilyimpactingtheperformanceofthekernelwhilstrunningreportsormonitorsagainstthosestatistics.Thisprovidesaniceanalogforatypicalapplication-layerprocesssetthatneedtointeractwitheachother,butwhichduetodesigncouldbeimplementedtooperateoneitherthesameCPU/hostoracompletelydifferentone.
Toimplementthis“lowtouch”collectionandreportingmechanism,thePostgreSQLdesignerschosetofork()aprocess,presumablyonthesameCPUormulti-CPUpackage,andthenuseanasynchronoussockettotransmitdatafromthekernelprocesstothecollector.Usingthepgstatapplication,theDBAcantheninteractwithwhateverthechildprocesshascollectedatanypointintime.
Allofthisisencodedwithinthemodulesrc/backend/postmaster/pgstat.c.
Becauseofthewaythatthisfundamentaldecisionwastakeninthisparticularcase,thedesignerchosetoencodedatatransmissionbetweenthekernelandthecollectorusinghost-nativerepresentation.Forexample:
Figure 5 | Data representation analysis in action
Developing Software in a Multicore & Multprocessor World | Klocwork White Paper | 10
Inthisexample,it’ssimpletoseetheassumptioninallitsglory,asthatdatamembermsg.msg_hdr.m_sizeisreadanduseddirectlyoffthewire,inwhatcouldbe,butisn’tinthiscase,networkorder.
Nowlet’sassumethatanewgenerationofdesignersrevisitthisdecisionandinsteadplaceemphasisonscaleandflexibilityovereaseofimplementation.Nowtheydecidetoplacethestatisticscollectorprocessonanarbitrarynodeinthehardwaredesign,ratherthanonthesamenodeasthekernelprocess.
Withthisdecisioninplace,theassumptionthatnetworkbyteorderandhostbyteorderarethesamecannolongerbemadeingeneral.Portingtothisnewassumptionsetcouldtakesignificanttime,bothfordevelopersandforthetestcrew,facedwithputtingtogetheramatrixofCPUs/hoststhatembodytheplethoraofrepresentationswecanexpecttosupportinthefield.
Usingatool-drivenapproach,however,thisentireeffortcanbecollapsedtoasingleanalysispass,takingminutesintotal,toseeareportofwhat’sinvolved.Inthiscase,thedesignerswouldbefacedwiththefollowingendianvulnerabilitiesthatwouldneedtobeaddressed(alongwiththeobviouslogisticalissuesaroundhowtoplacetheprocessontherighthost/CPU,ofcourse):
pgstats.c:line1988:functionpgstat_recvbuffer() Value‘msg.msg_hdr.m_size’isusedinnetworkorder.
pgstats.c:line1443:functionpgstat_send() Value‘*msg’isusedinhostbyteorder.
Thesetwosimpleissuesmightbethoughtofasthewholeproblemdomain.However,lookingfurtherintowhatthismoduleiscapableof,certaininformationcanbepersistedacrosssessionsusingastatisticsfile.Ifwefurtherourdecisiontoallowtheprocesstobespawnedonheterogeneoushardware,wemightwellcontinuethatspreadbyallowingdifferentinstantiationsofsaidprocesstooccuronheterogeneoushardware,thusrequiringpersistentdatatobeendiansafe:
pgstats.c:line2556:functionpgstat_read_statsfile() Value‘format_id’isusedinenvironmentbyteorder. Similarerrorscanbefoundonline(s):2610,2684,2717,2740.
pgstats.c:line2312:functionpgstat_write_statsfile() Value‘format_id’isusedinhostbyteorder. Similarerrorscanbefoundonline(s):2351,2384,2411,2412.
Armedwiththisinformation,thedesignercanmakeallrequiredupdatestoremoveendianvulnerabilityfromtheircodeinonepass.
Conclusion________________________________________________________________________________________________________________
Thecomplexityofthisproblemdomainisvast,sothere’snoonesolution,tool,orapproachthatwilladdressallyourproblems.Developmentteamsneedtoequipthemselveswithgoodtools,smartdesignassumptions,andevensmarterdeveloperstoreconcilethefeatureracebeingdemandedbythemarketandtheunderlyingplatformcomplexitythatimplies.Whenitcomestoselectingatool,sourcecodeanalysisshouldbeonyourshortlistasitoffersacompellingmixofscalability,flexibilityandtheabiltiytoaddressabroadsetofissuesthatwillhelpyoutoensuretheoverallqualityandsecurityofyourcode.
IN THE UNITED STATES:15 New England Executive ParkBurlington, MA 01803
IN CANADA:30 Edgewater Street, Suite 114Ottawa, ON K2L 1V8
t: 1.866.556.2967f: 613.836.9088www.klocwork.com
AbouttheAuthor_______________________________________________________________________________________________________
GwynFisheristheCTOofKlocworkandisresponsibleforguidingthecompany’stechnicaldirectionandstrategy.Withnearly20yearsofglobaltechnologyexperience,Gwynbringsavaluablecombinationofvision,experience,anddirectinsightintothedeveloperperspective.Withabackgroundinformalgrammarsandcomputationallinguistics,Gwynhasspentmuchofhiscareerworkinginthesearchandnaturallanguagedomains,holdingseniorexecutivepositionswithcompanieslikeHummingbird,FulcrumTechnologies,PCDOCSandLumaPath.AtKlocwork,Gwynhasreturnedtohisoriginalpassion,compilertheory,andisleveraginghisexperienceandknowledgeofthedevelopermindsettomovethepracticaldomainofstaticanalysistothenextlevel.
AboutKlocwork_________________________________________________________________________________________________________
Klocwork®offersaportfolioofsoftwaredevelopmentproductivitytoolsdesignedtoensurethesecurity,qualityandmaintainabilityofcomplexcodebases.Usingprovenstaticanalysistechnology,Klocwork’stoolsidentifycriticalsecurityvulnerabilitiesandqualitydefects,optimizepeercodereview,andhelpdeveloperscreatemoremaintainablecode.Klocwork’stoolsareanintegralpartofthedevelopmentprocessforover700customersintheconsumerelectronics,mobiledevices,medicaltechnologies,telecom,militaryandaerospacesectors.
© Copyright Klocwork Inc. 2010 · All Rights Reserved