cory kapser cloning considered harmful considered harmful...
TRANSCRIPT
"CloningConsideredHarmful"ConsideredHarmful
Alookback
CoryKapserandMikeGodfreyUniversityofWaterloo
WCRE2006,BeneventoItaly
CoryKapser
• LivinginCalgarysincegraduaHngin2009
• Workingatastartup
Hisoffice,1986 TalkatIBMToronto,1999
Myoffice,2002 IWPSE2004,Kyoto
Manyconferences,incl.MSR2006 IWPCkeynote,2003
WCRE1998,andmanyotherssince
Dolly,RIP
TheagesofsoUwarecloningresearch
Abrieflookbackandforward
[StolenfromIWSC2015keynote]
Math,science,andengineering
"Where'sthescienceinwhatyoudo?"
• Scienceconcernsbuildingreliableexplanatorymodelsofhowtheworldworks– ScienHficmodelsmustbetestable,and(reasonably)consistentwithobservedreality
– ScienHficmodelsmaybestaHsHcal,structural(Newtonianmechanics),…
– Wheredothemodelscomefrom?• Experiencewiththedomain,groundedtheory,…
Math,science,andengineering
"Where'sthescienceinwhatyoudo?"
• Mathisn'treallyscience,perse!– Itsonlyhardrequirementisself-consistency;goodmathneednothavepracHcalapplicaHons.
– Mathisakindofpoetry,withrigorousrulesofconstrucHon.
– MathisatoolusedbyscienHststohelpbuildandanalyzemodelsofhowtheworldworks
Math,science,andengineering
"Where'sthescienceinwhatyoudo?"
• Engineeringis(roughly)thepracHcalapplicaHonofsciencetosolvereal-worldproblems– Mustunderstand"howtheworldworks"togetstuffdone– Engineersmustalsoknowaboutprocesses,materials,costs,risks,tools,people,law,ethics,etc.
1. TheAgeofMath:Clonedetec9onispossible!
– Algorithmsexist,canscaletobigsystems!
[1990s:Baker,Johnson,Baxter,Ducasse,Merlo,…][2000s:CCFinder,iClones,NiCad,ConQAT,…]
Clone detection is possible!
ThethreeoverlappingagesofsoUwarecloningresearch
ThethreeoverlappingagesofsoUwarecloningresearch
2. TheAgeofScience:Cloneanalysisispossible!
– Let'sassumedetecHon"justworks",whatcanyoutellmeaboutthesystemanditsclones?• Someclonesevolve,somedon't• Type3clonesaremorestable/lessbuggy/…
– Notjustsourcecode!OtherarHfactsmaner!WedoMSR!e.g.,StackOverflow,Bugzilla,gitmeta-data
[2000s-2010s:Krinke,Kim,Kapser,Jürgens,…IWSC-16]
Clone analysis is possible!
ThethreeoverlappingagesofsoUwarecloningresearch
3. TheAgeofEngineering:Clonemanagementispossible!
– Clonetriage,clonerefactoring,linkedediHng,clonerecommendaHon,programtransformaHon,SPLs,…
[2000s-2010s:Robillard,LaToza,Basit,…]
Clone management is possible!
Theroadahead:Alookback?
• Goodnews:We'veaccomplishedalot!– Weknowwhatwecandetect,howwell,andatwhatscale– We'vedonemanyempiricalstudiesontype1/2/3clones
• …butwesHllaren'tsurewhichclonesareimportant/riskyandwhy– Somaybeweneedmorecomprehensivemodelsofcloningasprac9cedbydevelopersandexperiencedbymanagers
Controversialstatement
Ifcloningresearchistohaveimpactonprac9ce,thenourimmediatescien9ficgoalsmustbe
moredeveloperoriented
• Itisnotenoughsimplytofindclonesandthenrefactor(someof)them;rather,wemustaskquesHonssuchas:– Whydotheseclonesexistinthefirstplace?– Whatdesigndecisionsledtotheircrea9on?– Howdodevelopersandmanagersperceivethem?– Whatpossiblerisksdotheyrepresenttotheongoingdevelopmentofthe
soIwaresystem?– Howcanwerecognizeclonesthatneedmanagement?– Whatstrategiesshouldweusetomanagethemoverthelongterm?
"Physicsistheonlyrealscience.Therestarejuststampcollec9ng."
Ernest Rutherford (1871 – 1937) Father of atomic physics Professor at McGill Univ.
&Univ. of Edinburgh Nobel prize for … chemistry
Zoologyc.1850
• MostHmeisspentdoingdatacollecHon,cleansing,curaHon,etc.
• Thenanalysis,organizaHon,categorizaHon,...– Basedonlow-levelempiricalobservaHon
• WeakpredicHvepower
AlongcomesDarwin… Ataxonomyofcloningintent1. Forking
– HardwarevariaHone.g.,LinuxSCSIdrivers[SCAM2011]
– PlasormvariaHon– ExperimentalvariaHon
2. TemplaHng– BoilerplaHng– API/libraryprotocols– Generalizedprogrammingidioms– Parameterizedcode
3. Post-hoccustomizing– Bugworkarounds– Replicate+specialize
"'Cloningconsideredharmful'consideredharmful",CoryJ.KapserandMichaelW.Godfrey,WCRE2006
Forking:PlasormvariaHon• MoHvaHon
– Differentplasorms⇒verydifferentlow-leveldetails– Interleavingplasorm-specificcodeinoneplaceistoocomplex
• Wellknownexamples– Linuxkernel“arch”subsystem– ApachePortableRunHme(APR)
• PortableimploffuncHonalitythatistypicallyplasormdependent,suchasfileandnetworkaccess
• fileio -> {netware, os2, unix, win32} • Typicaldiffs:inserHonofextraerrorcheckingorAPIcalls• Cloningisobviousandwelldocumented
Forking:PlasormvariaHon• Advantagesofcloning
– Each(cloned)variantissimplertomaintain– Norisktostabilityofothervariants– Plasormsarelikelytoevolveindependently,somaintenanceislikely
tobe“mostlyindependent”
• Disadvantagesofcloning– EvoluHonintwodimensions:userreqs+plasormsupport– Changetotheinterfacelevelmeanschangestomanyfiles
Forking:PlasormvariaHon• Managementandlong-termissues
– FactoroutplasormindependentfuncHonalityasmuchaspossible– DocumentvariaHonpoints+plasormpeculiariHes– As#ofplasormsgrows,interfacetothesystemhardens
• StructuralmanifestaHons– Cloningusuallyhappensatthefilelevel
• ClonesareoUenstoredasfiles(ordirectories)inthesamesourcedirectory• DirectoriesmaybenamedaUerOSsorsimilar
Cloningharmfulness:Twoopensourcecasestudies
Group Pattern Good Harmful Good HarmfulForking Hardware variation 0 0 0 0Forking Platform variation 10 0 0 0Forking Experimental variation 4 0 0 0Templating Boiler-plating 5 0 6 7Templating API 0 0 0 9Templating Idioms 0 12 1 1Templating Parameterized code 5 12 10 34Customizing Replicate + specialize 12 4 15 16Customizing Bug workarounds 0 0 0 0Total 36 28 32 67
Apache httpd 2.2.4 - 60 Tokens Gnumeric 1.6.3 - 60 Tokens
Apache Gnumeric
Thechallengeforfuturecloningresearch
• Grandtheoriesand"acHonable"bigideasareanoblegoal,ofcourse!– Ithelpstoavoid"yeah,OK,butwhocares?"papers
• …butlearningto"swimwiththedata"leadstohigherqualityresearchinthelongrun– ItabetsopportunisHcexploraHonoftheproblemspace– …whichleadtodeeperinsightsabouttheproblemspace– …andmakesfundamentalnaïvemistakeslesslikely
Thankyou