comp 652: machine l ea rning - mcgill university
TRANSCRIPT
![Page 1: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/1.jpg)
COMP 652 - Lecture 1 1 / 27
COMP 652: Machine Learning
Instructor: Prof. Ted Perkins
E-mail: [email protected]
Web:http://www.mcb.mcgill.ca/∼perkins/COMP652 Fall2008/index.html
![Page 2: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/2.jpg)
Today
COMP 652 - Lecture 1 2 / 27
1. Machine learning: Examples and motivation2. Administrative: syllabus3. Types of machine learning: supervised, unsupervised, reinforcement4. Supervised learning intro (with examples)
![Page 3: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/3.jpg)
Categorizing faint objects in a Sky Survey (Usama Fayyad)
COMP 652 - Lecture 1 3 / 27
! B&W digital images of virtually entire sky taken at high resolution! Astronomers could not examine each image in detail, and catalogue the
objects observed! Machine learning was used to automate the categorization and
cataloguing of ≥ 109 faint objects
![Page 4: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/4.jpg)
Face detection and recognition
COMP 652 - Lecture 1 4 / 27
How would you write a computer program to:
! Detect faces in a scene?! Recognize the face of a particular person?
![Page 5: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/5.jpg)
Oncology (Alizadeh et al.)
COMP 652 - Lecture 1 5 / 27
! Activity levels of all (≈25,000) genes were measured in lymphomapatients
! Cluster analysis determined three different subtypes (where only two wereknown before), having different clinical outcomes
![Page 6: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/6.jpg)
Backgammon (Tesauro)
COMP 652 - Lecture 1 6 / 27
! Starting with expert knowledge, the TD-gammon program learned to playbackgammon by playing millions of games against itself . . .
! And became (arguably) the best player in the world!
![Page 7: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/7.jpg)
And many more. . .
COMP 652 - Lecture 1 7 / 27
! Bioinformatics: sequence alignment, analyzing microarray data,information integration, ...
! Computer vision: object recognition, tracking, segmentation, active vision,...
! Robotics: state estimation, map building, decision making! Graphics: building realistic simulations! Speech: recognition, speaker identification! Financial analysis: option pricing, portfolio allocation! E-commerce: automated trading agents, data mining, spam, ...! Medicine: diagnosis, treatment, drug design,...! Computer games: building adaptive opponents! Multimedia: retrieval across diverse databases
![Page 8: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/8.jpg)
What makes a good machine learning problem?
COMP 652 - Lecture 1 8 / 27
! Problems involving very large datasets! Problems involving complex relationships between variables! Problems involving numerical reasoning! Problems for which expert opinions are not readily available / cost
effective / rapid enough! . . .
Basically, anything that could be done by computer (in principle), but which ishard to program directly.
![Page 9: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/9.jpg)
Types of machine learning problems
COMP 652 - Lecture 1 9 / 27
We’ll discuss three major types of problems:
1. Supervised learning
! Given data comprising input-output pairs! Create an output-predictor for new inputs
2. Unsupervised learning
! Given data objects, look for “patterns”: clusters, variablerelationships, . . .
! Or, “compress” data in some sense
3. Reinforcement learning
! An AI interacts with environment, receiving rewards and punishment! Must learn to behave optimally
![Page 10: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/10.jpg)
Machine learning algorithms / representations / topics
COMP 652 - Lecture 1 10 / 27
! Linear, polynomial and logistic regression and/or classification! Artificial neural networks! Decision and regression trees! “Nonparameteric” or instance-based methods! Computational learning theory! Ensemble methods! Value function approximation! Flat and hierarchical clustering! Dimensionality reduction (PCA, ICA, . . . )
![Page 11: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/11.jpg)
COMP 652 - Lecture 1 11 / 27
Syllabus!
![Page 12: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/12.jpg)
COMP 652 - Lecture 1 12 / 27
Supervised learning
![Page 13: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/13.jpg)
Supervised learning
COMP 652 - Lecture 1 13 / 27
! An example: Wisconsin Breast Cancer! Formalization! Supervised learning flowchart! Univariate linear regression
![Page 14: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/14.jpg)
Wisconsin Breast Cancer Data from UC Irvine ML Repository
COMP 652 - Lecture 1 14 / 27
! Cell samples were taken from tumors in breast cancer patients beforesurgery, and imaged
! Tumors were excised! Patients were followed to determine whether or not the cancer recurred,
and how long until recurrence or disease free
Cell Nuclei of Fine Needle Aspirate
![Page 15: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/15.jpg)
Steps to solving a supervised learning problem
COMP 652 - Lecture 1 15 / 27
1. Collect data2. Decide on inputs and output(s), including encoding3. . . .
![Page 16: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/16.jpg)
Input features
COMP 652 - Lecture 1 16 / 27
! Researchers computed 30 different features of the cells’ nuclei in theimage.
– Features relate to radius, “texture”, area, smoothness, concavity, etc.of the nuclei
– For each image, mean, standard error, and max of these propertiesacross nuclei
! The result is a data table:
tumor size texture perimeter . . . outcome time18.02 27.6 117.5 N 3117.99 10.38 122.8 N 6120.29 14.34 135.1 R 27. . .
![Page 17: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/17.jpg)
Terminology
COMP 652 - Lecture 1 17 / 27
tumor size texture perimeter . . . outcome time18.02 27.6 117.5 N 3117.99 10.38 122.8 N 6120.29 14.34 135.1 R 27. . .
! The columns are called inputs or input variables or features or attributes! “outcome” and “time” are called outputs or output variables or targets! A row in the table is called a training example or sample or instance! The whole table is called the training/data set
! Usually, features are chosen based on some combination of expertknowledge, guesswork, and experimentation
! Sometimes, their choice/definition is also part of the learning problem,called a feature selection or construction problem
![Page 18: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/18.jpg)
More generally
COMP 652 - Lecture 1 18 / 27
! Typically, a training example i has the form: (xi,1 . . . xi,n, yi) where n isthe number of attributes (32 in our case).
! We will use the notation xi to denote the column vector with elementsxi,1, . . . xi,n.(These are all the input feature values for one training example.)
! The training set D consists of m training examples! Let X denote the space of input values (e.g., $32)! Let Y denote the space of output values (e.g. {N, R}, or $)
![Page 19: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/19.jpg)
Supervised learning (almost) defined
COMP 652 - Lecture 1 19 / 27
Given a data set D ⊂ X × Y , find a function:
h : X → Y
such that h(x) is a “good predictor” for the value of y.h is called a hypothesis
! If Y = R, this problem is called regression! If Y is a finite discrete set, the problem is called classification! If Y has 2 elements, the problem is called binary classification or
concept learning! The hypothesis h comes from a hypothesis class (or space) H of possible
solutions.
(Note: Sometimes for classification problems we output the probability ofeach of the possible outputs.)
![Page 20: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/20.jpg)
Steps to solving a supervised learning problem
COMP 652 - Lecture 1 20 / 27
1. Collect data2. Decide on inputs and output(s), including encoding.
This determines X and Y .3. Choose a hypothesis class.
This determines H.4. . . .
![Page 21: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/21.jpg)
An abstract example
COMP 652 - Lecture 1 21 / 27
x
y
x y0.86 2.490.09 0.83-0.85 -0.250.87 3.10-0.44 0.87-0.43 0.02-1.10 -0.120.40 1.81-0.96 -0.830.17 0.43
What hypothesis class do we choose to model the how y depends on x?
![Page 22: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/22.jpg)
Linear hypotheses
COMP 652 - Lecture 1 22 / 27
! Suppose y was a linear function of x:
hw(x) = w0 + w1x1 + w2x2 + . . . + wnxn
! wi are called parameters or weights! To simplify notation, we always add an attribute x0 = 1 to the other n
attributes (also called bias term or intercept term):
hw(x) =n∑
i=0
wixi = wTx
where w and x are vectors of length n + 1.
How should we pick w? No w exactly fits data. . .
![Page 23: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/23.jpg)
Error minimization!
COMP 652 - Lecture 1 23 / 27
! Intuitively, w should make the predictions of hw close to the true values yon the data we have
! Hence, we will define an error function or cost function to measure howmuch our prediction differs from the ”true” answer
! We will pick w such that the error function is minimized
What error function should we choose?
![Page 24: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/24.jpg)
Least mean squares (LMS)
COMP 652 - Lecture 1 24 / 27
! Main idea: try to make hw(x) close to y on the examples in the trainingset
! We define a sum-of-squares error function
J(w) =12
m∑
i=1
(hw(xi) − yi)2
! We will choose w such as to minimize J(w)
![Page 25: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/25.jpg)
Steps to solving a supervised learning problem
COMP 652 - Lecture 1 25 / 27
1. Collect data2. Decide on inputs and output(s), including encoding.
This determines X and Y .3. Choose a hypothesis class.
This determines H.4. Choose an error function (cost function) to define the best hypothesis5. Choose an algorithm for searching efficiently through the space of
hypotheses.
![Page 26: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/26.jpg)
Minimizing LMS for a linear hypothesis class
COMP 652 - Lecture 1 26 / 27
J(w) =12
m∑
i=1
(hw(xi) − yi)2
=12
m∑
i=1
n∑
j=0
wjxi,j
− yi
2
How do we do it?
![Page 27: COMP 652: Machine L ea rning - McGill University](https://reader034.vdocuments.us/reader034/viewer/2022051319/627ae0efece757650c151614/html5/thumbnails/27.jpg)
COMP 652 - Lecture 1 27 / 27
We had some discussion on the board, but this is pretty much where weended. . .