assieme: finding and leveraging implicit references in a web search interface for programmers
Post on 06-Jan-2016
32 Views
Preview:
DESCRIPTION
TRANSCRIPT
Assieme: Finding and Leveraging Implicit
References in a Web Search Interface for Programmers
Raphael Hoffmann, James Fogarty, Daniel S. Weld
University of Washington, SeattleUIST 2007
Programmers Use Search
• To identify an API• To seek information about an API• To find examples on how to use an
API
“Programmatically output an Acrobat PDF file in Java.”
Example Task:
Example: General Web Search Interface
Example: Code-Specific Web Search
Interface
…
Problems
• Information is dispersed: tutorials, API itself, documentation, pages with samples
• Difficult and time-consuming to …– locate required pieces,– get an overview of alternatives,– judge relevance and quality of results,– understand dependencies.
• Many page visits required
With Assieme we …
• Designed a new Web search interface• Developed needed inference
Outline
• Motivation• What Programmers Search For• The Assieme Search Engine
– Inferring Implicit References– Using Implicit References for Scoring
• Evaluation of Inference & User Study• Discussion & Conclusion
Six Learning Barriers faced by Programmers (Ko et
al. 04) • Design barriers — What to do?
• Selection barriers — What to use?
• Coordination barriers — How to combine?
• Use barriers — How to use?
• Understanding barriers — What is wrong?
• Information barriers — How to check?
Examining Programmer Web Queries
Objective• See what programmers search for
Dataset• 15 million queries and click-through data• Random sample of MSN queries in 05/06
Procedure• Extract query sessions containing ‘java’ – 2,529• Manual looking at queries and defining regex
filters• Informal taxonomy of query sessions
Examining Programmer Web Queries
Examining Programmer Web Queries
Descriptive Contain package, type or member name
Contain terms like “example”, “using”, “sample code”
64.1 % 35.9 %
17.9 %
“java JSP current date” “java SimpleDateFormat”
“using currentdate in jsp”
Selection barrier Use barrier
Coordination barrier
Assieme
example
code
documentation
required
libaries
relevance indicated by
# uses
Summaries show
referenced types
links torelated
info
Challenges
How to put the right information on the interface ?
• Get all programming-related data• Interpret data and infer relationships
Outline
• Motivation• What Programmers Search For• The Assieme Search Engine
– Inferring Implicit References– Using Implicit References for Scoring
• Evaluation of Inference & User Study• Discussion & Conclusion
Assieme’s Data
… is crawled using existing search engines
Pages withcode examples JAR files JavaDoc pages
Queried Google on“java ±import ±class …”
Queried Google on“overview-tree.html …”
Downloaded libraryfiles for all projects onSun.com, Apache.org,
Java.net, SourceForge.net
~2,360,000 ~79,000 ~480,000
The Assieme Search Engine
… infers 2 kinds of implicit references
JAR files
JavaDoc pages
Pages withcode examples
Uses of packages,
types and members
Matches of packages,
types and members
?
unclear segmentation
Extracting Code Samples
code in a different language (C++)distracting terms ‘…’ in code
line numbers
Extracting Code Samples
remove HTML commands,but preserve line breaksremove some distracters by heuristicslaunch (error-tolerant) Java parser at every line break
(separately parse for types, methods, and sequences of statements)
<html><head><title></title></head><body>A simple example:<br><br> 1: import java.util.*; <br>2: class c {<br>3: HashMap m = new HashMap();<br>4: void f() { m.clear(); }<br>5: }<br><br><a href=“index.html”>back</a></body></html>
<html><head><title></title></head><body>A simple example:<br><br> 1: import java.util.*; <br>2: class c {<br>3: HashMap m = new HashMap();<br>4: void f() { m.clear(); }<br>5: }<br><br><a href=“index.html”>back</a></body></html>
A simple example:
1: import java.util.*;2: class c {3: HashMap m = new HashMap();4: void f() { m.clear(); }5: }
back
A simple example:
1: import java.util.*;2: class c {3: HashMap m = new HashMap();4: void f() { m.clear(); }5: }
back
A simple example:
import java.util.*;class c {HashMap m = new HashMap();void f() { m.clear(); }}
back
A simple example:
import java.util.*;class c {HashMap m = new HashMap();void f() { m.clear(); }}
back
Resolving External Code References
Naïve approach of finding term matches does not work:
1 import java.util.*;2 class c {3 HashMap m = new HashMap();4 void f() { m.clear(); }5 }
Reference java.util.HashMap.clear() on line 4 only detectable by considering several lines
?
Use compiler to identify unresolved names
Resolving External Code References
• Index packages/types/members in Jar files
JARfiles
Utility function:# covered references(and JAR
popularity)
java.util.HashMap.clear()java.util.HashMap…
greedily pickbest JARs
JARfiles
unresolved names
compile
indexlookup
put onclasspath
• Compile & lookup
Scoring
• Existing techniques …
– Docs modeled as weighted term frequencies– Hypertext link analysis (PageRank)
– JAR files (binary code) provide no context– Source code contains few relevant keywords– Structure in code important for relevance
• … do not work well for code, because:
Using Implicit References to Improve Scoring
• Assieme exploits structure on Web pages
HTML hyperlinks
and structure in code
code references
Scoring
APIs(packages/types/members)
Web pages
Scoring
APIs• Use text on doc pages and on pages with
code samples that reference API (~ anchor text)
• Weight APIs by #incoming refs (~ PageRank)
Web Pages• Use fully qualified references
(java.util.HashMap) and adjust term weights• Filter pages by references• Favor pages with accompanying text
Outline
• Motivation• What Programmers Search For• The Assieme Search Engine
– Inferring Implicit References– Using Implicit References for Scoring
• Evaluation of Inference & User Study• Discussion & Conclusion
Evaluating Code Extraction and Reference Resolution
… on 350 hand-labeled pages from Assieme’s data
Reference Resolution• Recall 89.6%, Precision 86.5% • False positives: Fisheye and diff pages• False negatives: incomplete code samples
Code Extraction• Recall 96.9%, Precision 50.1% ( 76.7%)• False positives: C, C#, JavaScript, PHP,
FishEye/diff• (After filtering pages without refs: precision 76.7%)
User Study
Assieme vs. Google vs. Google Code Search
Design• 40 search tasks based on queries in logs:
query “socket java” “Write a basic server that communicates using Sockets”
• Find code samples (and required libraries)• 4 blocks of 10 tasks: 1 for training + 1 per
interfaceParticipants• 9 (under-)graduate students in Computer Science
User Study – Task Time
Assieme Google GCS0
50
100
150
seco
nd
s (
SE
M)
F(1,258)=5.74p ≈ .017
F(1,258)=1.91p ≈ .17
*significant
User Study – Solution Quality
0 seriously flawed .5 generally good but fell short in critical regard1 fairly complete
Assieme Google GCS0.0
0.2
0.4
0.6
0.8
1.0
qu
alit
y (
SE
M)
F(1,258)=55.5p < .0001F(1,258)=6.29
p ≈ .013**
User Study – # Queries Issued
Assieme Google GCS0.0
0.5
1.0
1.5
2.0
2.5
#qu
erie
s (
SE
M)
F(1,259)=9.77p ≈ .002
F(1,259)=6.85p ≈ .001
**
Outline
• Motivation• What Programmers Search For• The Assieme Search Engine
– Inferring Implicit References– Using Implicit References for Scoring
• Evaluation of Inference & User Study• Discussion & Conclusion
Discussion & Conclusion
• Assieme – a novel web search interface• Programmers obtain better solutions,
using fewer queries, in the same amount of time
• Using Google subjects visited 3.3 pages/task, using Assieme only 0.27 pages, but 4.3 previews
• Ability to quickly view code samples changed participants’ strategies
Thank YouRaphael Hoffmann
Computer Science & EngineeringUniversity of Washington
raphaelh@cs.washington.edu
James FogartyComputer Science & Engineering
University of Washingtonjfogarty@cs.washington.edu
Daniel S. WeldComputer Science & Engineering
University of Washingtonweld@cs.washington.edu
This material is based upon work supported by the National Science Foundation under grant IIS-0307906, by the Office of Naval Research under grant N00014-06-1-0147, SRI International under CALO grant 03-000225 and the Washington Research Foundation / TJ Cable Professorship.
top related