subverting subversion? (or “an lhcb ‘commitment’ poll”?)

27
Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 1 Subverting subversion? (or “An LHCb ‘commitment’ poll”?) Rob Lambert Remember… it’s just for fun!

Upload: dexter

Post on 23-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Subverting subversion? (or “An LHCb ‘commitment’ poll”?). Rob Lambert. Remember… it’s just for fun!. Aims. Impersonal: Mine the data available from svn Gauge the number of individuals contributing to core software Gauge something about contributions to core software Personal - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 1

Subverting subversion?(or “An LHCb ‘commitment’ poll”?)

Rob Lambert

Remember… it’s

just for fu

n!

Page 2: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Aims Impersonal:

Mine the data available from svn Gauge the number of individuals contributing to core software Gauge something about contributions to core software

Personal I’ve been in this collaboration for 8.5 years (and probably not much longer) How much have I actually contributed to the codebase?

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 2

Remember… it’s

just for fu

n!

Page 3: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Things you notice on SVN1. Release managers commit much more often

2. There are some projects which have skewed statistics

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 3

Tagging svn cp, from trunk to tags

Svn often “blames” the copier

Necessary workbut doesn’tchange the code

DBASE/PARAM Stripping Erasmus/Urania

Data filesno diffs or millions of diffs

“Analysis-specific”software

Single users withvastly different usage habits

Legacy codeLegacy data

Not all core software

Page 4: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

List of metrics I need metrics which can avoid these issues

1. svn blame

2. svn blame without comments!

3. Total number of commits

4. Total number of touched files (sum over all commits)

5. Total number of changed lines (sum over all commits)

Collect statistics by:a) Projectb) User

Remember, this is all public information anyway (and JFF)

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 4

Page 5: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Basic Totals

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 5

My survey draws from the following data

So, on average we’d have 8k blames per person

But there are major differences in the different metrics

Before I can rank contributions, I need to examine the metrics

Stat Core Soft Entire RepoCommitters 361 495

Total lines blamed 2.8 M 7.9 M

Commits 88 k 125 k

Total diffs 57M 112 M

Total files changed 0.5 M 0.7 M

Page 6: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Methods have different strengths and weaknesses (further details in the backup slides)

To overcome all problems, I must combine the metrics (Don’t need the most basic blame, combine all others)

Comparing metrics

Method Historical

Current

Ignore Tags

Ignore Copies

IncludeComments

IncludeDocument’n

IgnoreWhitespace

Project name reliable?

Include Tests (qmt)

Include .ref?

Suppress long.ref diffs?

Scales with contribution?

Scales as # projects?

Scales with complexity?

Includes clean-ups?

blame

- comment

commits

files

diffs

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 6

Page 7: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Metrics

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 7

0 50000 100000 150000 200000 250000 3000000

50000100000150000200000250000300000350000400000450000

f(x) = 1.38254397482164 x − 443.815370839477R² = 0.976246064286745

Blames against blames -comment

blames-comment

Blam

es

0 50000 100000 150000 200000 250000 3000000

2000400060008000

1000012000140001600018000

f(x) = 0.0205507257539513 x + 125.198568106126R² = 0.190659707654524

Commits against Blames - comment

Blames - comment

Com

mits

97% correlated, don’t use both!

20% correlated, need to use both!

0.0 1000000.0 2000000.0 3000000.0 4000000.0 5000000.00

100002000030000400005000060000700008000090000

f(x) = 0.0143535384063346 x + 127.011190305812R² = 0.891450803039921

Files against diffs

diffs/2

File

s

89% correlated, but measure different things! Weight down slightly

Page 8: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Combining the metrics The metrics are very disparate in magnitude and spread

1. Take a safe logarithm to alleviate the scale problem2. Scale that down to the max of the metric3. Average over historical metrics, weight down historical by 2/34. Re-exponentiate to return a reasonable spread, express as %

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 8

name files commits blames blame-comment lines/2cattanem 80877 15439 46559 35337 4425280.0jonrob 52054 6567 192877 132495 4652467.5

mvesteri 1258 153 113553 99524 197790.0

]6exp[/6exp100,

,...,max,1,maxlog

max

max

userusermetrics

metricsmetric

metricuser

metric

user

metriclast

metricfirst

metricusermetric

metricuser

mGPAw

ssw

m

sssxs

Page 9: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Pause for thought?

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 9

Was I lying to myself, and on my CV? Let’s find out!

LHCb has around 10 application experts, plus

one per subdetectorLHCb core-soft

developer base consists of around 100 developers.

Page 10: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Other Stats Top 20 experts created 66% of the code

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 10

20 people, 66%341 people, 33%

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 950

50

100

150

200

250

GPA

Freq

uenc

y, #

peop

le

Page 11: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Other Stats Top 105 developers created the rest of the code

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 11

236 people, 5%125 people, 95%

0 50 100 150 200 250 300 350 4000.0

0.2

0.4

0.6

0.8

1.0

1.2

# of people

Cum

ulat

ive

Cont

ribut

ion

Page 12: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

The big reveal

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 12

Page 13: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Ranking

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 13

The top 20, with exclusions

(i.e. no Erasmus/Urania/obsolete/PARAM/DBASE/Stripping)

#### name GPA files commits blames diffs/2 blame - comment--------------------------------------------------------------------------------------------------------------------------------------------------------

1 jonrob 76.1 52054 6567 192877 4652468.0 1324952 cattanem 72.6 80877 15439 46559 4425280.0 353373 ibelyaev 60.3 21721 3683 408993 804127.0 2509034 marcocle 57.5 69126 1904 71563 4262422.0 569015 frankm 56.0 16768 3421 305320 609614.5 2371796 mcoombes 35.5 13586 363 79560 1692002.0 649217 jpalac 34.5 10193 6181 32880 117944.0 243168 robbep 33.7 5759 1247 82471 460789.0 607879 pkoppenb 33.5 14345 4086 19768 306867.0 13475

10 rlambert 32.8 10268 2710 19527 614237.5 1472311 graven 32.5 9702 3937 35308 112972.5 2630612 gcorti 30.3 7578 3198 22404 240781.5 1641513 odescham 28.0 5977 1261 49687 125332.5 3793014 wouter 27.8 3324 1718 64833 81623.5 5353915 hmdegaud 26.9 5348 1454 20019 259001.5 1892616 truf 26.5 4753 1218 46148 79648.5 4143617 jost 24.0 2000 799 86178 78254.5 6158618 barrand 22.4 4791 1644 4271 978583.5 289319 ocallot 22.3 2914 746 45950 63892.5 3480420 mneedham 22.0 3945 940 28740 62806.5 21351

Page 14: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

The medallists

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 14

12 3

Page 15: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Ranking (2)

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 15

The top 20, no exclusions

Ranking is pretty stable due to combined metric

#### name GPA files commits blames diffs / 2 blame-comment-----------------------------------------------------------------------------------------------------------------------------------------------------------

1 cattanem 69.1 92881 17699 99870 5770542.0 883752 jonrob 65.3 57520 7525 211669 9503448.0 1470873 ibelyaev 54.4 28234 4990 536087 1254729.0 3409854 marcocle 52.9 74768 2305 146540 4437568.0 1314355 frankm 44.8 18365 3745 305320 675987.0 2371796 pkoppenb 42.7 29061 8060 64013 815972.0 529247 pseyfert 41 7882 1019 838184 715032.5 8272458 graven 38.2 62388 4332 35361 693566.0 263309 diegoms 38 11082 1669 118837 2845016.0 108698

10 robbep 36.1 15393 2663 109255 709928.5 8312311 rlambert 34.6 11407 3082 84333 663746.0 7180312 jpalac 32.3 13881 7465 39626 234795.5 3046713 gcorti 31.6 12007 4303 34411 766920.5 2535314 mcoombes 29.1 13886 418 86968 1750572.0 6999915 liblhcb 25.4 19902 54 166164 1271841.0 16616316 phunt 23.9 4400 351 181278 258571.0 15653917 ocallot 23.6 6605 1420 46338 185653.5 3519218 mkarbach 23.4 3385 516 155646 179205.0 14367219 odescham 23.3 6361 1357 54241 133133.5 4152820 wouter 22.8 3447 1805 70385 86388.5 57510

Page 16: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

The medallists

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 16

13

Page 17: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Interesting outliers

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 17

Metric Paul Seyfert Vanya Belyaev Thomas Bird

GPA (exclusions) (#93) : 6.5 % (#3) : 54.4 % (#217) : 2.0 %

GPA (inclusive) (#9) : 36.7 % (#3) : 60.3 % (#37) : 17.8 %

Blamed (inclusive) (#1) : 0.8 M (#2) : 0.5 M (#4) : 0.30 M

Comments (inclusive) (#22) : 11 K (#1) : 0.2 M (#2) : 0.14 M

#1 or 2(it’s cool that the computingproject leader is the leader!)

(15% of all code)

Page 18: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Looks like we have around 1/3 comments

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10

10

20

30

40

50

60

70

Proportion of blamed lines which were comments

Freq

uenc

y, #

con

tribu

ters

Comments

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 18

(Vanya sits here)

(Paul sits here)

Page 19: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Make your own metric? You could use a standard tool:

svnplot, svnstat, statsvn, plotsvn, mypy-svn-stat … Most standard tools are based on pysvn and sqlite, and as

Marco/Ben know, I gave up on trying to have Lbscripts, sqlite3 and pysvn working at the same time

Of course, I’ve committed my code into SVN … SvnPollTools https://svnweb.cern.ch/trac/lhcb/browser/packages/trunk/SvnPollTools Complete lists as csv files here

Other possible improvements: Ignore all ref files and test directories also in the line diffs (probably won’t significantly change the outcome) Add LHCbDirac’s svn (will add Joel and Philippe et al. ) Add Gaudi’s svn (will make Marco Cle #1 probably…) Multiply by the call graph?

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 19

Page 20: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Conclusions LHCb “core applications” appear to have

~20 experts, who contributed 2/3 of the total codebase ~100 developers who contributed 1/3 of the codebase ~250 contributors whose core soft contribution is very tiny

Interesting observations I can also make over dinner: Years of service GPA GPA Institute affiliation Permanence of current position…

And in the end, well, at least I know I helped

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 20

Remember… it’s

just for fu

n!

Page 21: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

End Backups are often required

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 21

Page 22: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

(1) blamed

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 22

Inclusions Exclusions.h .cpp .icpp .xml .xsd .dec .py Exclude blank lines

Exclude tags and branches

Pros ConsMeasures a real contribution to the currently used code

Can mis-attribute entire lines thanks to a single small modification

Measures current activity No attribution for past activity

Eliminates “tagging” activity Not all lines are equal in contribution

Includes comment lines

Incorrect attribution of svn cp command

Page 23: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

(2) blamed - comments

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 23

Inclusions Exclusions.h .cpp .icpp .xml .xsd .dec .py Exclude blank lines

Exclude tags and branches

Exclude comments!

Pros ConsMeasures a real contribution to the currently used code

Can mis-attribute entire lines thanks to a single small modification

Measures current activity No attribution for past activity

Ignores comment lines Not all lines are equal in contribution

Eliminates “tagging” activity Comments -> important documentation!

Incorrect attribution of svn cp command

Page 24: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

(3) #commits

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 24

Inclusions ExclusionsEverything Exclude tags

Pros ConsMeasures historical contribution to the codebase

Not all commits are equal in contribution

Eliminates “tagging” activity Also sensitive to whitespace changes

Eliminates pure svn-cp commits No subdivision for massive commits

No accounting for comments

Can mis-attribute to wrong project

Page 25: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

(4) #files

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 25

Inclusions ExclusionsEverything Exclude tags

Pros ConsMeasures historical contribution to the codebase

No value judgement on the files themselves

Eliminates “tagging” activity Also sensitive to whitespace changes

Eliminates pure svn-cp commits No subdivision for massive commits

Ranks different commits No accounting for comments

Includes documentation changes Can mis-attribute to wrong project

Page 26: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

(5) #diffs

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 26

Inclusions ExclusionsEverything Exclude tags

Exclude whitespace diffs

Pros ConsMeasures historical contribution to the codebase

No value judgement on changes that were made

Eliminates “tagging” activity No accounting for comments

Eliminates pure svn-cp commits Can mis-attribute to wrong project

Ranks different commits Takes a long time to poll svn

Includes documentation changes

Eliminates whitespace changes

Page 27: Subverting subversion? (or “An  LHCb  ‘commitment’ poll”?)

Correlations

Rob Lambert, NIKHEF LHCb Core Soft, 5th March 2014 27

0 20000 40000 60000 80000 1000000.0

10.020.030.040.050.060.070.080.0

f(x) = 0.00112048936312839 x + 4.86136456006535R² = 0.607656019875388

GPA against Files

Files

GPA

120

0140

0160

0180

0110

001

1200

114

001

1600

118

001

0.010.020.030.040.050.060.070.080.0

f(x) = 0.00711953374680521 x + 4.3052659797783R² = 0.617984658577826

GPA against commits

Commits

GPA

0.0 100000.0 200000.0 300000.00.0

20.0

40.0

60.0

80.0f(x) = 0.00031312750049335 x + 4.24244084616698R² = 0.539661711051237

GPA against Blames - comment

Blames - comment

GPA

0 1000000 2000000 3000000 4000000 50000000.0

10.020.030.040.050.060.070.080.0

f(x) = 1.59313991347065E-05 x + 4.76027043649553R² = 0.521285505848452

GPA against diffs

Diffs / 2

GPA