this is a heavily data-oriented
Post on 15-Aug-2015
256 Views
Preview:
TRANSCRIPT
Thomas G. Dietterich
Department of Computer Science
Oregon State University
Corvallis, Oregon 97331
http://www.cs.orst.edu/~tgd
Machine Learning: Making Computer Science
Scientific
Acknowledgements
VLSI Wafer Testing Tony Fountain
Robot Navigation Didac Busquets Carles Sierra Ramon Lopez de Mantaras
NSF grants IIS-0083292 and ITR-085836
Outline
Three scenarios where standard software engineering methods fail
Machine learning methods applied to these scenarios
Fundamental questions in machine learning
Statistical thinking in computer science
Scenario 1: Reading Checks
Find and read “courtesy amount” on checks:
Possible Methods:
Method 1: Interview humans to find out what steps they follow in reading checks
Method 2: Collect examples of checks and the correct amounts. Train a machine learning system to recognize the amounts
Scenario 2: VLSI Wafer Testing
Wafer test: Functional test of each die (chip) while on the wafer
Which Chips (and how many) should be tested?
Tradeoff: Test all chips on wafer?
Avoid cost of packaging bad chips Incur cost of testing all chips
Test none of the chips on the wafer?May package some bad chipsNo cost of testing on wafer
Possible Methods
Method 1: Guess the right tradeoff point Method 2: Learn a probabilistic model
that captures the probability that each chip will be bad Plug this model into a Bayesian decision
making procedure to optimize expected profit
Scenario 3: Allocating mobile robot camera
Binocular
No GPS
Camera tradeoff
Mobile robot uses camera both for obstacle avoidance and landmark-based navigation
Tradeoff: If camera is used only for navigation, robot
collides with objects If camera is used only for obstacle
avoidance, robot gets lost
Possible Methods
Method 1: Manually write a program to allocate the camera
Method 2: Experimentally learn a policy for switching between obstacle avoidance and landmark tracking
Challenges for SE Methodology
Standard SE methods fail when…1) System requirements are hard to collect
2) The system must resolve difficult tradeoffs
(1) System requirements are hard to collect
There are no human experts Cellular telephone fraud
Human experts are inarticulate Handwriting recognition
The requirements are changing rapidly Computer intrusion detection
Each user has different requirements E-mail filtering
(2) The system must resolve difficult tradeoffs
VLSI Wafer testing Tradeoff point depends on probability of bad
chips, relative costs of testing versus packaging
Camera Allocation for Mobile Robot Tradeoff depends on probability of
obstacles, number and quality of landmarks
Machine Learning: Replacing guesswork with data
In all of these cases, the standard SE methodology requires engineers to make guesses Guessing how to do character recognition Guessing the tradeoff point for wafer test Guessing the tradeoff for camera allocation
Machine Learning provides a way of making these decisions based on data
Outline
Three scenarios where software engineering methods fail
Machine learning methods applied to these scenarios
Fundamental questions in machine learning
Statistical thinking in computer science
Basic Machine Learning Methods
Supervised Learning Density Estimation Reinforcement Learning
Supervised Learning
8
3
6
0
1
Training Examples
LearningAlgorithm
Classifier
New Examples
8
AT&T/NCR Check Reading System
Recognition transformer is a neural network trained on 500,000 examples of characters
The entire system is trained given entire checks as input and dollar amounts as output
LeCun, Bottou, Bengio & Haffner (1998) Gradient-Based Learning Applied to Document Recognition
Check Reader Performance
82% of machine-printed checks correctly recognized
1% of checks incorrectly recognized 17% “rejected” – check is presented to a
person for manual reading
Fielded by NCR in June 1996; reads millions of checks per month
Supervised Learning Summary
Desired classifier is a function y = f(x) Training examples are desired input-
output pairs (xi,yi)
Density Estimation
Training Examples
LearningAlgorithm
DensityEstimator
P(chipi is bad) = 0.42
Partially-tested wafer
On-Wafer Testing System
Trained density estimator on 600 wafers from mature product (HP; Corvallis, OR) Probability model is “naïve Bayes” mixture model
with four components (trained with EM)
W
C209C3C2C1 . . .
One-Step Value of Information
Choose the larger of Expected profit if we predict remaining
chips, package, and re-test Expected profit if we test chip Ci, then
predict remaining chips, package, and re-test [for all Ci not yet tested]
On-Wafer Chip Test Results
$1,160
$1,170
$1,180
$1,190
$1,200
$1,210
$1,220
$1,230
Profit($K)
Test all VOI testing
3.8% increase in profit
Density Estimation Summary
Desired output is a joint probability distribution P(C1, C2, …, C203)
Training examples are points X= (C1, C2, …, C203) sampled from this distribution
Reinforcement Learning
agent
Environment
state s
reward r
action a
Agent’s goal: Choose actions to maximize total reward
Action Selection Rule is called a “policy”: a = (s)
Reinforcement Learning for Robot Navigation
Learning from rewards and punishments in the environment Give reward for reaching goal Give punishment for getting lost Give punishment for collisions
Experimental Results:% trials robot reaches goal
Busquets, Lopez de Mantaras, Sierra, Dietterich (2002)
Reinforcement Learning Summary
Desired output is an action selection policy
Training examples are <s,a,r,s’> tuples collected by the agent interacting with the environment
Outline
Three scenarios where software engineering methods fail
Machine learning methods applied to these scenarios
Fundamental questions in machine learning
Statistical thinking in computer science
Fundamental Issues in Machine Learning
Incorporating Prior Knowledge Incorporating Learned Structures into
Larger Systems Making Reinforcement Learning Practical Triple Tradeoff: accuracy, sample size,
hypothesis complexity
Incorporating Prior Knowledge
How can we incorporate our prior knowledge into the learning algorithm? Difficult for decision trees, neural networks,
support-vector machines, etc.Mismatch between form of our knowledge and
the way the algorithms work Easier for Bayesian networks
Express knowledge as constraints on the network
Incorporating Learned Structures into Larger Systems
Success story: Digit recognizer incorporated into check reader
Challenges: Larger system may make several
coordinated decisions, but learning system treated each decision as independent
Larger system may have complex cost function: Errors in thousands place versus the cents place: $7,236.07
Making Reinforcement Learning Practical
Current reinforcement learning methods do not scale well to large problems
Need robust reinforcement learning methodologies
The Triple Tradeoff
Fundamental relationship between amount of training data size and complexity of hypothesis space accuracy of the learned hypothesis
Explains many phenomena observed in machine learning systems
Learning Algorithms
Set of data points Class H of hypotheses Optimization problem: Find the
hypothesis h in H that best fits the data
TrainingData
h
Hypothesis Space
Triple Tradeoff
Amount of Data – Hypothesis Complexity – Accuracy
N = 1000
Hypothesis Space Complexity
Acc
urac
y
N = 10
N = 100
Triple Tradeoff (2)
Number of training examples N
Acc
urac
y
Hypothesis
Com
plexity
H1
H2
H3
Intuition
With only a small amount of data, we can only discriminate between a small number of different hypotheses
As we get more data, we have more evidence, so we can consider more alternative hypotheses
Complex hypotheses give better fit to the data
Fixed versus Variable-Sized Hypothesis Spaces
Fixed size Ordinary linear regression Bayes net with fixed structure Neural networks
Variable size Decision trees Bayes nets with variable structure Support vector machines
Corollary 1:Fixed H will underfit
Number of training examples N
Acc
urac
y
H1
H2 underfit
Corollary 2:Variable-sized H will overfit
Hypothesis Space Complexity
Acc
urac
y
N = 100overfit
Ideal Learning Algorithm: Adapt complexity to data
Hypothesis Space Complexity
Acc
urac
y
N = 10
N = 100
N = 1000
Adapting Hypothesis Complexity to Data Complexity
Find hypothesis h to minimizeerror(h) + complexity(h)
Many methods for adjusting Cross-validation MDL
Outline
Three scenarios where software engineering methods fail
Machine learning methods applied to these scenarios
Fundamental questions in machine learning
Statistical thinking in computer science
The Data Explosion
NASA Data 284 Terabytes (as of August, 1999) Earth Observing System: 194 G/day Landsat 7: 150 G/day Hubble Space Telescope: 0.6 G/day
http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html
The Data Explosion (2)
Google indexes 2,073,418,204 web pages
US Year 2000 Census: 62 Terabytes of scanned images
Walmart Data Warehouse: 7 (500?) Terabytes
Missouri Botanical Garden TROPICOS plant image database: 700 Gbytes
Old Computer Science Conception of Data
Store Retrieve
New Computer Science Conception of Data
Store Build
Models
Solve
Problems
Problems
Solutions
Machine Learning:Making Data Active
Methods for building models from data Methods for collecting and/or sampling
data Methods for evaluating and validating
learned models Methods for reasoning and decision-
making with learned models Theoretical analyses
Machine Learning andComputer Science
Natural language processing Databases and data mining Computer architecture Compilers Computer graphics
Hardware Branch Prediction
Source: Jiménez & Lin (2000) Perceptron Learning for Predicting the Behavior of Conditional Branches
Instruction Scheduler for New CPU
The performance of modern microprocessors depends on the order in which instructions are executed
Modern compilers rearrange instruction order to optimize performance (“instruction scheduling”)
Each new CPU design requires modifying the instruction scheduler
Instruction Scheduling
Moss, et al. (1997): Machine Learning scheduler can beat performance of commercial compilers and match the performance of research compiler.
Training examples: small basic blocks Experimentally determine optimal instruction
order Learn preference function
Computer Graphics: Video Textures
Generate new video by splicing together short stretches of old video
A B C D E F
B D E D E F A
Apply reinforcement learning to identify good transition points
Arno Schödl, Richard Szeliski, David H. Salesin, Irfan Essa (SIGGRAPH 2000)
Video TexturesArno Schödl, Richard Szeliski, David H. Salesin, Irfan
Essa (SIGGRAPH 2000)
You can find this video at Virtual Fish Tank Movie
Graphics: Image Analogies
: ::
: ?
Hertzmann, Jacobs, Oliver, Curless, Salesin (2000) SIGGRAPH
Learning to Predict Textures
A(p) A’(p)
B(q) B’(q)
Find p to minimize Euclidean distance between
and
B’(q) := A’(p)
Image Analogies
: ::
:
A video can be found at
Image Analogies Movie
Summary
Standard Software Engineering methods fail in many application problems
Machine Learning methods can replace guesswork with data to make good design decisions
Machine Learning and Computer Science
Machine Learning is already at the heart of speech recognition and handwriting recognition
Statistical methods are transforming natural language processing (understanding, translation, retrieval)
Statistical methods are creating opportunities in databases, computer graphics, robotics, computer vision, networking, and computer security
Computer Power and Data Power
Data is a new source of power for computer science
Every computer science student should learn the fundamentals of machine learning and statistical thinking
By combining engineered frameworks with models learned from data, we can develop the high-performance systems of the future
top related