1 cs 178h introduction to computer science research what is cs research?
TRANSCRIPT
1
CS 178HIntroduction to
Computer Science Research
What is CS Research?
What is CS Research?
• Discovery of new knowledge of computing through mathematical analysis and experimental evaluation of algorithms and computer software.
2
Epistemology(definitions from Wikipedia)
• Epistemology (from Greek επιστήμη - episteme, "knowledge" + λόγος, "logos") or theory of knowledge is the branch of philosophy concerned with the nature and scope (limitations) of knowledge. It addresses the questions:– "What is knowledge?"
– "How is knowledge acquired?"
– "What do people know?"
– "How do we know what we know?"
3
Rationalism
• Rationalism is "any view appealing to reason as a source of knowledge or justification" (Lacey 286). In more technical terms it is a method or a theory "in which the criterion of the truth is not sensory but intellectual and deductive" (Bourke 263).
• Originated with Socrates (469 BC–399 BC) and Plato (428/427 BC – 348/347 BC).
4
Empiricism
• Empiricism is a theory of knowledge which asserts that knowledge arises from experience. Empiricism emphasizes the role of experience and evidence, especially sensory perception, in the formation of ideas.
• Originated with Aristotle (384 BC – 322 BC)
5
Rationalism in CS(Theoretical CS)
• Programs are formal mathematical objects.
• Therefore, important properties of algorithms/software can be proven mathematically.– Termination– Correctness (satisfies a formal specification)– Computational Complexity (time and space
requirements)
6
Theoretical CS Research
• Algorithm Design and Analysis– Design a new (more efficient) algorithm for some well-
defined problem (e.g. sorting, longest-common-subsequence)
– Mathematically prove the correctness and improved complexity of the new algorithm.
• Theoretical Analysis– Form a mathematical conjecture about a computational
problem (e.g. graph isomorphism is NP-complete)
– Mathematically prove the conjecture as a theorem.
7
Limits of Rationalism in CS
• Sometimes software is too complex to analyze theoretically.
• Sometimes correctness cannot be characterized formally and depends on natural or human behavior.– Protein folding
– Handwriting/speech recognition
• Sometimes software behavior on real data depends on unknown natural properties of this data.– Locality affecting paging performance
8
Empiricism in CS(Experimental CS)
• Behavior of software can be studied experimentally.
• Anecdotal evidence (running a few sample cases) is insufficient.
• Collect data (e.g. accuracy, run-time) on running programs many times on large, real-world benchmark collections.
• Verify hypotheses about behavior using controlled experiments.
• Statistically analyze results for significance.
9
Scientific Method(steps from Wikipedia)
• 1) Define the question
• 2) Gather information and resources (observe)
• 3) Form hypothesis
• 4) Perform experiment and collect data
• 5) Analyze data
• 6) Interpret data and draw conclusions that serve as a starting point for new hypothesis
• 7) Publish results
• 8) Retest (frequently done by other scientists)
10
1) Define the question
• Example from My Research: Search Query Disambiguation from Short Sessions – Can a web search engine disambiguate queries?
11
scrubs Search
?
2) Gather information and resources
• Obtained web search session data from Microsoft
• Find instances of ambiguous queries
• Find contextual clues that might help disambiguate queries
12
Context can Aid Disambiguation
98.7 fm
kroq
scrubs
www.star987.com
www.kroq.com
???
huntsville hospital
ebay.com
scrubs
www.huntsvillehospital.com
www.ebay.com
???scrubs-tv.com scrubs.com
3) Form Hypothesis
• Previous queries and clicks in a session can help disambiguate queries by relating them to previous sessions involving the same query (where we know what result was clicked).
14
4) Perform Experiment and Collect Data
• Build system that uses prior context and previous session data to predict clicked results for new user.
• Reorder results from existing search engine based on predicted probability of clicking on a result.– Should reduce number of results user needs to
examine before finding a relevant one.
• Test on unseen data and compare predictions to actual results clicked.
15
Using Relational Information with aMarkov Logic Network (MLN)
huntsville hospital
ebay
scrubs
huntsvillehospital.org
ebay.com
???
huntsville school
. . .
. . .
hospitallink.com
scrubs
scrubs-tv.com
…
ebay.com
scrubs
scrubs.com
Controlled Experiment
• Performance of experimental system must be compared to some baseline or control.
• Controls are necessary to demonstrate the system is improving over some naïve method (strawman) or current best system for a problem.– For example, in the old joke, someone claims that they are snapping
their fingers "to keep the tigers away"; and justifies this behavior by saying "see - its working!" While this "experiment" does not falsify the hypothesis "snapping fingers keeps the tigers away", it does not really support the hypothesis - not snapping your fingers does not keep the tigers away as well (Wikipedia: Experiment)
17
Control for Query Disambiguation
• Simple control is to order results from search engine randomly.
• Another baseline is to just use ordering from existing (non-personalized) search engine.
18
Performance Metrics
• Need quantitative measure of system’s performance (runtime or accuracy).
• Compare quantitative performance of experimental system to baseline control system.
• To measure accuracy of ordering of web search results we measure AUC-ROC– Percentage of irrelevant results not seen by user
before finding a relevant result (if scan results from top)
19
5) Analyze Data
• Do results support the hypothesis?
• Are differences statistically significant?– Use statistical test to determine if observed
differences are unlikely to be due only to random variation, i.e. probability of null hypothesis < .05.
20
Results (AUC-ROC)
AUC-ROC
0.46
0.48
0.5
0.52
0.54
0.56
0.58
Random Click-Sim
Click-KW-Sim
MLN1 MLN2 MLN3
*
**
* Indicates statistically significant improvement over previous result
6) Interpret data and draw conclusions that serve as a starting point for new hypothesis
• Is random ordering the best baseline to compare to?
• What if just order results based on popularity (i.e. how many people clicked on a particular result after submitting a given ambiguous query).
22
New Baseline Results
23
Refine System
• Develop MLN that incorporates popularity information.
• Rerun experiment to obtain results for revised version and verify the hypothesis that it performs better than the popularity baseline.
24
Results for Revised System
25
7) Publish Results
• Paper submitted to the international data mining conference.– KDD-09: Paris, June 28 – July 1, 2009
26