mining programming language usage with boa robert dyer these research activities supported in part...

Download Mining Programming Language Usage with Boa Robert Dyer These research activities supported in part by the US National Science Foundation (NSF) grants CNS-15-13263,

If you can't read please download the document

Upload: aubrey-west

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

3 What do I mean by software repository?

TRANSCRIPT

Mining Programming Language Usage with Boa Robert Dyer These research activities supported in part by the US National Science Foundation (NSF) grants CNS , CNS , CCF , CCF , CCF , CCF , CCF , TWC , CCF , CCF , and CCF Tien N. Nguyen Hridesh Rajan Hoan Anh Nguyen 2 Todays tutorial is about Mining Software Repositories at an Ultra-large-scale 3 What do I mean by software repository? 4 5 What features do they have? 6 What do I mean by mining software repositories (MSR)? 7 What are some examples of software repository mining? 8 9 What is the most used programming language? 10 How many words are in commit messages? Words[] = update, Words[] = cleanup, Words[] = updated, Words[] = refactoring, Words[] = fix, Words[] = test, 9428 Words[] = typo, 9288 Words[] = updates, 7746 Words[] = javadoc, 6893 Words[] = bugfix, 6295 11 How has unit testing been adopted over time? JUnit 4 release 12 What makes this ultra-large-scale mining? 13 Previous examples queried... Projects699,331 Code Repositories494,158 Revisions15,063,073 Unique Files69,863,970 File Snapshots147,074,540 AST Nodes18,651,043,23 Over 250GB of pre-processed data from SourceForge 14 Most recent dataset (Sep 2015) Projects7,830,023 Code Repositories380,125 Revisions23,229,406 Unique Files146,398,339 File Snapshots484,947,086 AST Nodes71,810,106,868 Over 270GB of pre-processed data from GitHub (focusing on Java projects) What am I interested in? 15 16 Language Studies What languages do programmers choose? [Meyerovich&Rabkin SPLASH'13] Reflection [Livshits et al. APLAS'05] [Calla et al. MSR'11] JavaScript / eval [Yue&Wang WWW'09] [Richards et al. PLDI'10] [Ratanaworabhan et al. WEBAPPS'10] [Richards et al. ECOOP'11] Generics [Basit et al. SEKE'05] [Parnin et al. MSR'11] [Hoppe&Hanenberg SPLASH'13] Object-oriented Features [Tempero et al. ECOOP'08] [Muschevici et al. OOPSLA'08] [Tempero ASWEC'09] [Grechanik et al. ESEM'10] [Gorschek et al. ICSE'10] Finding use of assert Requires use of a parser (e.g. JDT) Requires knowledge of several APIs SF.net / GitHub API SVNkit/JGit/etc Must be manually parallelized 17 18 ASSERTS: output sum of int; visit(input, visitor { before node: CodeRepository -> { snapshot := getsnapshot(node, "SOURCE_JAVA_JLS"); foreach (i: int; def(snapshot[i])) visit(snapshot[i]); stop; } before node: Statement -> if (node.kind == StatementKind.ASSERT) ASSERTS { snapshot := getsnapshot(node, "SOURCE_JAVA_JLS"); foreach (i: int; def(snapshot[i])) visit(snapshot[i]); stop; } before node: Statement -> if (node.kind == StatementKind.ASSERT) ASSERTS { snapshot := getsnapshot(node, "SOURCE_JAVA_JLS"); foreach (i: int; def(snapshot[i])) visit(snapshot[i]); stop; } before node: Statement -> if (node.kind == StatementKind.ASSERT) ASSERTS statement; after T -> statement; }; visit(node, id); 27 Easing Source Code Mining with Visitors id := visitor { before id : T1 -> statement; before T2, T3 -> statement; before _ -> statement; }; 28 Easing Source Code Mining with Visitors ASTRoot Namespace Declaration MethodVariable Type StatementExpression ASTRoot Namespace Declaration MethodVariable Type StatementExpression 29 before n: Declaration -> { } Easing Source Code Mining with Visitors Method Type StatementExpression ASTRoot Namespace Declaration Variable before n: Declaration -> { foreach (i: int; n.fields[i]) visit(n.fields[i]); } before n: Declaration -> { foreach (i: int; n.fields[i]) visit(n.fields[i]); stop; } Lets revisit the assert use example. 30 31 Finding use of assert ASSERTS: output sum of int; 32 Finding use of assert ASSERTS: output sum of int; visit(input, visitor { }); 33 Finding use of assert ASSERTS: output sum of int; visit(input, visitor { before node: Statement -> }); 34 Finding use of assert ASSERTS: output sum of int; visit(input, visitor { before node: Statement -> if (node.kind == StatementKind.ASSERT) }); 35 Finding use of assert ASSERTS: output sum of int; visit(input, visitor { before node: Statement -> if (node.kind == StatementKind.ASSERT) ASSERTS { snapshot := getsnapshot(node, "SOURCE_JAVA_JLS"); foreach (i: int; def(snapshot[i])) visit(snapshot[i]); stop; } before node: Statement -> if (node.kind == StatementKind.ASSERT) ASSERTS if (node.kind == StatementKind.ASSERT) ASSERTS cur_date = int(node.commit_date); }); First Uses of Annotation 48 AnnotFirstUse: output bottom(1)[string] of string weight int; cur_date: int; cur_file: string; visit(input, visitor { before node: Revision -> cur_date = int(node.commit_date); before node: ChangedFile -> { if (!iskind("SOURCE_JAVA_JLS", node.kind)) stop; cur_file = node.name; } }); First Uses of Annotation 49 AnnotFirstUse: output bottom(1)[string] of string weight int; cur_date: int; cur_file: string; visit(input, visitor { before node: Revision -> cur_date = int(node.commit_date); before node: ChangedFile -> { if (!iskind("SOURCE_JAVA_JLS", node.kind)) stop; cur_file = node.name; } before node: Modifier -> if (node.kind == ModifierKind.ANNOTATION) }); First Uses of Annotation 50 AnnotFirstUse: output bottom(1)[string] of string weight int; times: map[string] of bool; cur_date: int; cur_file: string; visit(input, visitor { before node: CodeRepository -> clear(times); before node: Revision -> cur_date = int(node.commit_date); before node: ChangedFile -> { if (!iskind("SOURCE_JAVA_JLS", node.kind)) stop; cur_file = node.name; } before node: Modifier -> if (node.kind == ModifierKind.ANNOTATION) if (!haskey(times, cur_file)) { times[cur_file] = true; } }); First Uses of Annotation 51 AnnotFirstUse: output bottom(1)[string] of string weight int; times: map[string] of bool; cur_date: int; cur_file: string; visit(input, visitor { before node: CodeRepository -> clear(times); before node: Revision -> cur_date = int(node.commit_date); before node: ChangedFile -> { if (!iskind("SOURCE_JAVA_JLS", node.kind)) stop; cur_file = node.name; } before node: Modifier -> if (node.kind == ModifierKind.ANNOTATION) if (!haskey(times, cur_file)) { times[cur_file] = true; AnnotFirstUse[input.id] cur_file = node.name; before node: Modifier -> if (node.kind == ModifierKind.ANNOTATION) files[cur_file] = lookup(files, cur_file, 0) + 1; }); if (len(files) > 0) { keyset := keys(files); foreach (k: int; keyset[k]) { AnnotUse 0 && len(m.statements[0].statements) > 0) Opportunity: Assert 61 # visit the method before m: Method -> if (len(m.statements) > 0 && len(m.statements[0].statements) > 0) # visit the first statement inside the methods block visit(m.statements[0].statements[0], visitor { before node: Statement -> }); Opportunity: Assert 62 # visit the method before m: Method -> if (len(m.statements) > 0 && len(m.statements[0].statements) > 0) # visit the first statement inside the methods block visit(m.statements[0].statements[0], visitor { before node: Statement -> # look for: if (..) throw new IllegalArgEx() if (node.kind == StatementKind.IF }); Opportunity: Assert 63 # visit the method before m: Method -> if (len(m.statements) > 0 && len(m.statements[0].statements) > 0) # visit the first statement inside the methods block visit(m.statements[0].statements[0], visitor { before node: Statement -> # look for: if (..) throw new IllegalArgEx() if (node.kind == StatementKind.IF && node.statements[0].kind == StatementKind.THROW }); Opportunity: Assert 64 # visit the method before m: Method -> if (len(m.statements) > 0 && len(m.statements[0].statements) > 0) # visit the first statement inside the methods block visit(m.statements[0].statements[0], visitor { before node: Statement -> # look for: if (..) throw new IllegalArgEx() if (node.kind == StatementKind.IF && node.statements[0].kind == StatementKind.THROW && match(`^(java\.lang\.)?IllegalArgumentException$`, node.statements[0].expression.new_type.name)) }); Opportunity: Assert 65 # visit the method before m: Method -> if (len(m.statements) > 0 && len(m.statements[0].statements) > 0) # visit the first statement inside the methods block visit(m.statements[0].statements[0], visitor { before node: Statement -> # look for: if (..) throw new IllegalArgEx() if (node.kind == StatementKind.IF && node.statements[0].kind == StatementKind.THROW && match(`^(java\.lang\.)?IllegalArgumentException$`, node.statements[0].expression.new_type.name)) # found one! o if (e.kind == ExpressionKind.NEW # make sure it is a generic type && def(e.new_type) && strfind(" -1 # make sure it isnt using a diamond already && strfind("", e.new_type.name) == -1) Opportunity: Diamond 79 before e: Expression -> if (e.kind == ExpressionKind.NEW # make sure it is a generic type && def(e.new_type) && strfind(" -1 # make sure it isnt using a diamond already && strfind("", e.new_type.name) == -1) # found one! o # try statement if (s.kind == StatementKind.TRY Opportunity: Try w/ Resources 82 before s: Statement -> # try statement with at a finally block if (s.kind == StatementKind.TRY && len(s.statements) > 1 && s.statements[len(s.statements)-1].kind == StatementKind.BLOCK) Opportunity: Try w/ Resources 83 before s: Statement -> # try statement with at least a finally if (s.kind == StatementKind.TRY && len(s.statements) > 1 && s.statements[len(s.statements)-1].kind == StatementKind.BLOCK) visit(s.statements[len(s.statements) - 1], visitor { before e: Expression -> }); Opportunity: Try w/ Resources 84 before s: Statement -> # try statement with at least a finally if (s.kind == StatementKind.TRY && len(s.statements) > 1 && s.statements[len(s.statements)-1].kind == StatementKind.BLOCK) visit(s.statements[len(s.statements) - 1], visitor { before e: Expression -> # find a call to close() if (e.kind == ExpressionKind.METHODCALL && def(e.method) && e.method == "close && len(e.method_args) == 0) }); Opportunity: Try w/ Resources 85 before s: Statement -> # try statement with at least a finally if (s.kind == StatementKind.TRY && len(s.statements) > 1 && s.statements[len(s.statements)-1].kind == StatementKind.BLOCK) visit(s.statements[len(s.statements) - 1], visitor { before e: Expression -> # find a call to close() if (e.kind == ExpressionKind.METHODCALL && def(e.method) && e.method == "close && len(e.method_args) == 0) # found one! o potential N+1 File.java (Revision N) File.java (Revision N+1) 90 Detected lots of conversions! manual, systematic sampling confirms 2602 conversions 13 not conversions AssertVarargsDiamondMultiCatch Try w/ Resources Underscore Literals Count K8.5K Files K3.8K Projects 91 Similar usage patterns AssertVarargsDiamondMultiCatch Try w/ Resources Underscor e Literals Count K8.5K Files K3.8K Projects Old code converted to use new features Only few features see high use AssertVarargs Binary Literals DiamondMultiCatch Try w/ Resources Underscore Literals Old 89K612K56K3.3M341K489K5.3M New 291K1.6M5K414K24K33K507K All 380K2.2M61K3.7M365K522K5.8M Files 1.39%12.74%0.11%12.25%2.28%1.85%5.86% Projects 18.18%88.78%5.9%59.08%49.75%37.27%51.15% Despite (missed) potential for use Feature adoption by individuals To summarize... 92 Summary Ultra-large-scale language feature studies pose several challenges Automatically parallelizes queries Domain-specific language, types, and functions to make mining software repositories easier Boa provides abstractions to address these challenges Ultra-large-scale dataset with millions of projects 93 Boa's Global Impact 370+ users from over 20 countries! Participate in the MSR 2016 Mining Challenge 94deadline: Feb 19