(briefly) active learning + course recap
DESCRIPTION
(Briefly) Active Learning + Course Recap. Active Learning. Remember Problem Set 1 Question #1? Part (c) required generating a set of examples that would identify the target concept in the worst case. …we were able to find the correct hypothesis (out of hundreds in H) with only 8 queries! - PowerPoint PPT PresentationTRANSCRIPT
(Briefly) Active Learning + Course Recap
Active Learning
• Remember Problem Set 1 Question #1?– Part (c) required generating a set of examples that
would identify the target concept in the worst case.
– …we were able to find the correct hypothesis (out of hundreds in H) with only 8 queries!
• Logarithmic in |X|• In general, guaranteeing perfect performance with
randomly drawn examples requires a number of queries in |X|.linear
Active Learning (2)• Interesting challenge: choosing which examples
are most informative• Increasingly important: problems are huge and
on-demand labelers are available– “Volunteer armies”: ESP game, Wikipedia– Mechanical Turk
• Key question: How to identify the most informative queries?– Both a technical question & a human interfaces
question
Recap
A Few Quotes• “A breakthrough in machine learning would be worth
ten Microsofts” (Bill Gates, Chairman, Microsoft)• “Machine learning is the next Internet”
(Tony Tether, Director, DARPA)• “Machine learning is the hot new thing”
(John Hennessy, President, Stanford)• “Web rankings today are mostly a matter of machine
learning” (Prabhakar Raghavan, Dir. Research, Yahoo)• “Machine learning is going to result in a real revolution” (Greg
Papadopoulos, CTO, Sun)• “Machine learning is today’s discontinuity”
(Jerry Yang, CEO, Yahoo)
5
Magic?
No, more like gardening
• Seeds = Algorithms• Nutrients = Data• Gardener = You• Plants = Programs
6
Types of Learning
• Supervised (inductive) learning– Training data includes desired outputs
• Unsupervised learning– Training data does not include desired outputs
• Reinforcement learning– Rewards from sequence of actions
• Semi-supervised learning– Training data includes a few desired outputs
7
Supervised LearningGIVEN:• Instances X
– E.g., days decribed by attributes: Sky, Temp, Humidity, Wind, Water, Forecast
• Hypothesis space H– E.g. MC2, conjunction of literals: < Sunny ? ? Strong ? Same >
• Training examples D – positive and negative examples of the target function c: <x1,c(x1)>,…, <xn,c(xn)>
FIND:• A hypothesis h in H such that h(x)=c(x) for all x in D.
Supervised Learning Algorithms
• Candidate-Elimination
x1=< Sunny,Warm,High,Strong,Cool,Same>
x2=< Sunny,Warm,High,Light,Warm,Same>
h1=< Sunny,?,?,Strong,?,?>
h2=< Sunny,?,?,?,?,?>
h3=< Sunny,?,?,?,Cool,?>
Instances
x2
x1
Hypotheses
h2
h3h1
h2 h1
h2 h3
specific
general
Decision Trees
• Learn conjunction of disjunctions by greedily splitting on “best” attribute values
Rule Learning
• Greedily learn rules to cover examples, e.g.:
• Can also be applied to learn first-order rules:
Neural Networks
• Non-linear regression/classification technique • Especially useful when inputs/outputs are numeric• Long training times, quick testing times
Inputs
OutputAge 34
2Gender
Stage 4
.6
.5
.8
.2
.1
.3.7
.2
“Probability of beingAlive”
0.6
.4
.2
Instance Based Methods
• E.g., K-nearest neighbor• Quick training times, long test times
• The “curse of dimensionality”
Support Vector Machines (1)
• Derived Feature Spaces (the Kernel Trick):
( )( )
( )( )
( )( )( )( )( )( )
( )( )
( )( )( )( )
(.) ( )( )
( )( )
( )( )
( )( )( )( )
( )( )
( )( )
( )( )( )( )
( )( )
Feature spaceInput space
Support Vector Machines (2)
• Maximizing Margin:
Class 1
Class 2
Bayes Nets (1)
Qualitative part: Directed acyclic graph (DAG)• Nodes - random vars. • Edges - direct influence
Quantitative part: Set of conditional probability distributions
0.95 0.05
e
b
e
0.94 0.06
0.001 0.999
0.29 0.01
be
b
b
e
BE P(A | B,E)Parents Pa of Alarm
Earthquake
JohnCalls
Burglary
Alarm
MaryCalls
Bayes Nets (2)
• Flexible modeling approach– Used for SL, SSL, UL
• Natural for explicitly encoding prior knowledge
Hidden Markov Models
• Special case of Bayes Nets for sequential data
• Admit efficient learning, decoding algorithms
ti ti+1 ti+2 ti+3
wi wi+1 wi+2 wi+3
cities such as Seattle
States – unobserved
Words – observed
Computational Learning Theory
• Based on the data we’ve observed, what can we guarantee?
• “Probably Approximately Correct” learning
• Extension to continuous inputs: VC dimension
)/1ln(||ln1
Hm
Optimization Techniques
• Local Search– Hill climbing, simulated annealing
• Genetic Algorithms– Key innovation: crossover– Also applied to programs (genetic programming)
Unsupervised Learning
• K-means• Hidden Markov Models
Both use the same general algorithm…Expectation Maximization
Key Lessons (1)
• You can’t learn without inductive biasFrom the Wired article assigned 1st week:
What do you think?
Today companies like Google, whichhave grown up in an era of massivelyabundant data, don’t have to settle forwrong models. Indeed, they don’t haveto settle for models at all.
Key Lessons (2)
• Overfitting– Can’t just choose the “most powerful” model
• Choose the “right” model– One that encodes your understanding of the domain
and meets your other requirements– E.g.
• HMMs vs. decision trees for sequential data• Decision trees vs. NNs for mushrooms• NNs vs. decision trees for face recognition
24
Course Advertisement
• EECS 395/495 Spring Quarter 2009“Web Information Retrieval and Extraction”
– Basics of Web search, extraction– New research & future directions– Discussion, project based
Thanks!