Download - A.I. Algorithms Cogs 188
A.I. AlgorithmsCogs 188
• 1 Midterm: 20%
• 1 Final Exam: 30%
• Assignment 0: 5%
• Assignment 1: 15%
• Assignment 2: 15%
• Assignment 3: 15%
Grades
Assignments are to be done individually (not in groups). Late assignments will have 33% penalty per day that they are late. So if you submit an assignment 1 minute late, you will have lost 33% of the points, if you submit an assignment 24 hours and 1 minute late, you will lose 66% of the points.
Tentative ScheduleDate Day Topics Covered Assignments
October 1st Thursday Machine Learning overview. Assignment 0 Assigned
October 6th Tuesday K-NN
October 8th Thursday Linear Regression - Objective Function
October 13th Tuesday Gradient Descent Assignment 1 Assigned
October 15th Thursday Perceptron
October 20th Tuesday Perceptron Revision
October 22nd Thursday Statistics & Probability - Distributions
October 27th Tuesday K-Means Assignment 1 Due
October 29th Thursday Midterm
November 3rd Tuesday Review and Hierarchical Clustering Assignment 2 Assigned
November 5th Thursday EM-Algorithm
November 10th Tuesday EM-Algorithm Cont.
November 12th Thursday EM-Algorithm Revision
November 17th Tuesday Genetic Algorithms Assignment 2 Due
November 19th Thursday Genetic Algorithms - Cont. Assignment 3 Assigned
November 24th Tuesday Genetic Algorithms - Examples
November 26th Thursday No class, happy thanksgiving!
December 1st Tuesday Bayes Theorem
December 3rd Thursday Naïve Bayes Classification
December 8th Tuesday A.I. In Healthcare
December 10th Thursday Review Assignment 3 Due
December 16th Wednesday Final Exam
• Instructor:– Dr. Anjum Gupta, [email protected]
• TA:– Qiyuan: [email protected]
• If you are sending an email to us, please send all theemails to both addresses, however posting yourquestions on Canvas is recommended, wheneverpossible.
Teaching Staff
1. Probability and Statistics2. Python / Jupyter Notebook (TA sections)3. Nearest Neighbor 4. Linear Regression, Logistic Regression5. Perceptron6. Bayes Theorem7. K-means, Hierarchical Clustering8. Genetic Algorithm9. EM Algorithm
Syllabus
Learning.
You are learning, if you improve your performance with experience.
Big Picture
Input Data
Statistics
Algorithms
Graph TheoryInformation
Theory
Probability Theory
Game Theory
Linear Algebra
Analytical Geometry
Output
Machine Learning Tools
Computer Science
Domain Expertise
Things you can do with Machine Learning
• Given voice stream, identify the speaker/language.
• Recognize handwritten numbers.
• Evaluating the “lifetime value” of a customer (or sales lead)
• Face or object recognition in a video stream
• Given symptoms, diagnose a disease.
• Adjusting stock portfolio based on sentiment and clustering
• Distinguish between a weed and a plant sapling.
• Hand gesture analysis, a glove that sends text messages.
• Too many to count individually. That what makes machine learning so useful.
Machine Learning in Agriculture
Blue River Technologies: Differentiating weed vs plant saplings
Root AI: Identifying ripe tomatoes to pick.
Let’s start with our canonical two broad categories!
• Supervised – Discriminant Models
• Unsupervised – Generative Models
For tasks, we, humans can technically do ourselves, but it will be nice to get some help and automate it!
Classifying Data
For tasks that we, humans, cannot do. E.g. Potentially generating new insights and extracting hidden information that data contains.
Understanding Data
Generally Speaking…
• Discriminative Models – Classifying Data
– Spam filter (Spam, Not Spam)
– Identify language from a voice stream
– Facial expression recognition
– Classify species according to some physical features
• Generative Models – Understanding Data
– Detect anomalies
– Finding probability of a scenario
– Predicting future outcomes
– Completing the missing data
Optimization Algorithms
• We will also learn two specific optimization algorithms.
– Gradient Descent
– Genetic Algorithms
Hand Written Digits example
Database of 20,000 images of handwritten digits, each labeled by a human (Supervised Learning)
Use these to learn a classifier which will label digit-images automatically…
Classification
Image What is the number?
?
?
Handwritten Digits example
Database of 20,000 images of handwritten digits, each labeled by a human (Supervised Learning)
Use these 20,0000 images to understand something about the digits and handwriting.
Understanding Data:Being able to generate the numbers!
These results are from one of the projects I worked on as with a fellow graduate student, Eric Wiewiora.
Regenerated using model of digit 2 Regenerated using model of digit 5
Naïve Bayes Example: Fishing data
Day Outlook Water Temperature
Pollutants in Water
Wind Fish Present
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
Bayesian Networks are for Bayesians, although frequentists are also welcome.
You are given various variables. For example: imagine you are going fishing.
DepthTemperature
Light
Corals
Food Source
Fish Present
You can map out some relationship among them through “expert knowledge,” then refine it and learn the exact parameters.
Now you can ask, What is the probability of Fish in a shallow & cold water with corals?
Complete Bayes Net has the Graph and the Prob. Tables with it.
Liver Disorder Diagnostic Bayesian Network
Bayesian Vs Frequentist
Sherlock Holmes was apparently a frequentist.I have no data yet. It is a capital mistake to theorize before
one has data.
(A Scandal in Bohemia)
This sounds like a Bayesian.
Intuition becomes increasingly valuable in the new information society precisely because there is so much data.
John Naisbitt (Author)
Complex Bayesian Networks have given way to “deep learning”
• New algorithms came along and replaced the idea of Bayesian networks.
• They still influence many unsupervised learning algorithms.
• Other algorithms such as EM-algorithm also helps us model the “true” nature of the data.
• We will visit some of the generative algorithms later in the course
• Let’s start with the classification algorithms first.
Classification
• Can a computer learn to recognize objects?
• Shown 10,000 flowers, can a computer “understand” flowers? Can it say if the new photograph shown is a flower?
Iris Setosa Iris Versicolor Iris Virginica
Let’s try our brain’s algorithm!
Iris SetosaIris Versicolor Iris Virginica
???
What is Similarity?The quality or state of being similar; likeness; resemblance; as, a similarity of features. Webster's Dictionary
For example, for someone who is writing a software for healthcare industry,They may have to deal with the questions of “how similar are two patients.”
It depends on what you are comparing the two objects for.
Whole lot of research and Ph.D. thesis, just on the concept of similarity.
1. Patient Similarity Networks for Precision Medicine
2. Patient Similarity: Emerging Concepts in Systems and Precision Medicine
3. Machine learning of patient similarity: A case study on predicting survival in cancer patient after locoregional chemotherapy
Fish Sorting: For Packaging
salmon
sea bass
sortingchamber
classifier
Pattern Classification, Chapter 1
29
An Example
• “Sorting incoming Fish on a conveyor according to species using optical sensing”
Sea bass
Species
Salmon
Pattern Classification, Chapter 1
30
• Problem Analysis
– Set up a camera and take some sample images to extract features
• Length
• Lightness
• Width
• Number and shape of fins
• Position of the mouth, etc…
• This is the set of all suggested features to explore for use in our classifier!
Pattern Classification, Chapter 1
31
• Classification
– Select the length of the fish as a possible feature for discrimination
Pattern Classification, Chapter 1
32
Pattern Classification, Chapter 1
33
The length is a poor feature alone!
Select the lightness as a possible feature.
Pattern Classification, Chapter 1
34
Pattern Classification, Chapter 1
35
• Adopt the lightness and add the width of the fish
Fish xT = [x1, x2]
Lightness Width
• Plotting Salmon and Seabass based on two-dimensional feature vector.
Feature extraction
Task: to extract features which are good for classification.
Good features: • Objects from the same class have similar feature values.
• Objects from different classes have different values.
“Good” features “Bad” features
Basic concepts
y x=
nx
x
x
2
1Feature vector
- A vector of observations (measurements).
- is a point in feature space .
Hidden state
- Cannot be directly measured.
- Patterns with equal hidden state belong to the same class.
Xx
x X
Yy
Task
- To design a classifer (decision rule)
which decides about a hidden state based on an onbservation.YX →:q
Pattern
Text Classification
• Representing Text as a Vector.• Stem words used, such that “computer, computes ..” all get noted under
“compute.”• The number in the vector is actually divided by the number of documents that number
appears in. “Inverse Document Frequency”
Grasshoppers
KatydidsLet’s go back to agriculture!
Given a collection of annotated data. In this case 5 instances Katydids of and five of Grasshoppers, decide what type of insect the unlabeled example is.
Katydid or Grasshopper?
Thorax Length
Abdomen Length Antennae
Length
MandibleSize
SpiracleDiameter
Leg Length
For any domain of interest, we can measure features
Color {Green, Brown, Gray, Other} Has Wings?
Insect ID
Abdomen Length
Antennae Length
Insect Class
1 2.7 5.5 Grasshopper
2 8.0 9.1 Katydid
3 0.9 4.7 Grasshopper
4 1.1 3.1 Grasshopper
5 5.4 8.5 Katydid
6 2.9 1.9 Grasshopper
7 6.1 6.6 Katydid
8 0.5 1.0 Grasshopper
9 8.3 6.6 Katydid
10 8.1 4.7 Katydids
11 5.1 7.0 ???????
We can store features in a database.
My_Collection
The classification problem can now be expressed as:
• Given a training database (My_Collection), predict the class label of a previously unseen instance
previously unseen instance =
An
ten
na
Le
ng
th
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Grasshoppers Katydids
Abdomen Length
Katydid or Grasshopper?
An
ten
na
Le
ng
th
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Abdomen Length
An
ten
na
Le
ng
th
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Grasshoppers Katydids
Abdomen Length
We will also use this lager dataset as a motivating example…
Each of these data objects are called…• exemplars• (training) examples• instances• tuples
An
ten
na
Le
ng
th
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Grasshoppers Katydids
Abdomen Length
We will also use this lager dataset as a motivating example…
Each of these data objects are called…• exemplars• (training) examples• instances• tuples
????
Nearest Neighbor Classifier
If the nearest instance to the previously unseen instance is a Katydid
class is Katydidelse
class is Grasshopper
KatydidsGrasshoppers
An
ten
na
Le
ng
th
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Abdomen Length
Hand Written Digits example
Database of 20,000 images of handwritten digits, each labeled by a human (Supervised Learning)
[28 x 28 greyscale; pixel values 0-255; labels 0-9]
Use these to learn a classifier which will label digit-images automatically…
Nearest neighbor
Image to label Nearest neighbor
Nearest neighbor
Image to label Nearest neighbor
Overall:
error rate = 6%
(on test set)
Grasshoppers Katydids
An
ten
na
Le
ng
th
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Abdomen Length
Classifying Insects
Each of these data objects are called…• exemplars• (training) examples• instances• tuples
An
ten
na
Le
ng
th
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Grasshoppers Katydids
Abdomen Length
We will also use this lager dataset as a motivating example…
Each of these data objects are called…• exemplars• (training) examples• instances• tuples
????
What else do we want?
• K-NN (K-Nearest Neighbors) is great!
• What is one obvious way we can improve our grasp on classification problem?
Lets try to study the classification problem with some examples.
I am going to show you some classification problems which were shown to pigeons!
Let us see if you are as smart as a pigeon!
Examples of
class A
3 4
1.5 5
6 8
2.5 5
Examples of
class B
5 2.5
5 2
8 3
4.5 3
8 1.5
4.5 7
What class is this object?
What about this one, A or B?
Pigeon Problem 1
Examples of
class A
3 4
1.5 5
6 8
2.5 5
Examples of
class B
5 2.5
5 2
8 3
4.5 3
8 1.5
This is a B!Pigeon Problem 1
Here is the rule.If the left bar is smaller than the right bar, it is an A, otherwise it is a B.
Examples of
class A
4 4
5 5
6 6
3 3
Examples of
class B
5 2.5
2 5
5 3
2.5 3
7 7
Pigeon Problem 2
So this one is an A.
The rule is as follows, if the two bars are equal sizes, it is an A. Otherwise it is a B.
Examples of
class A
4 4
1 5
6 3
3 7
Examples of
class B
5 6
7 5
4 8
7 7
6 6
Pigeon Problem 3
This one is really hard!What is this, A or B?
Examples of
class A
4 4
1 5
6 3
3 7
Examples of
class B
5 6
7 5
4 8
7 7
6 6
Pigeon Problem 3 It is a B!
The rule is as follows, if the square of the sum of the two bars is less than or equal to 100, it is an A. Otherwise it is a B.
Why did we spend so much time with this game?
Because we wanted to show that almost all classification problems have a geometric interpretation, check out the next 4 slides…
Examples of
class A
3 4
1.5 5
6 8
2.5 5
Examples of
class B
5 2.5
5 2
8 3
4.5 3
Pigeon Problem 1
Here is the rule again.If the left bar is smaller than the right bar, it is an A, otherwise it is a B.
Lef
t B
ar
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Right Bar
Examples of
class A
4 4
5 5
6 6
3 3
Examples of
class B
5 2.5
2 5
5 3
2.5 3
Pigeon Problem 2
Lef
t B
ar
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Right Bar
Let me look it up… here it is.. the rule is, if the two bars are equal sizes, it is an A. Otherwise it is a B.
Examples of
class A
4 4
1 5
6 3
3 7
Examples of
class B
5 6
7 5
4 8
7 7
Pigeon Problem 3
Lef
t B
ar
100
10 20 30 40 50 60 70 80 90 100
10
20
30
40
50
60
70
80
90
Right Bar
The rule again:if the square of the sum of the two bars is less than or equal to 100, it is an A. Otherwise it is a B.
Examples of
class A
2 2
1 7
7 3
3 8
Examples of
class B
Pigeon Problem 4
The rule again:If both squares are bigger than 6, it is an B. Otherwise it is a A.
Lef
t B
ar
100
10 20 30 40 50 60 70 80 90 100
10
20
30
40
50
60
70
80
90
Right Bar
8 6
7 6
7 5
10
1 2 3 4 5 6 7 8 9 10
123456789
100
10 20 30 40 50 60 70 80 90100
10
20
30
40
50
60
70
80
90
1
0
1 2 3 4 5 6 7 8 9 10
123456789
Which of the “Pigeon Problems” can be
solved by the Simple Linear Classifier?
1) Perfect
2) Useless
3) Perfect
4) Not so good
Lef
t B
ar
100
10 20 3040 506070 8090100
102030405060708090
Right Bar
Nearest neighbor: pros and cons
Pros• Simple• No assumptions about the distribution or shape of different classes.• Excellent performance on a wide range of tasks• Effective with large training set
Cons• Time consuming – with n training points in Rd, time to label a new
point is O(nd)• No insight into the domain.• Would prefer a compact classifier• No good way to determine parameter “k.”• Dependant highly on the distance measure used.
Some Variants
• K-nearest Neighbors• Pick K nearest Neighbors and take the majority vote
• Parzen Window• Pick an area around a point, look at the majority of points in
that window
• Many other variants. Nearest Neighbor search is elementary but deserves proper attention. Best accuracy for the digits data is using a variant of nearest neighbor.
Distance Measures
How many clusters does this have? Which two points are the neighbors?
A Famous ProblemR. A. Fisher’s Iris Dataset.
• 3 classes
• 50 of each class
The task is to classify Iris plants into one of 3 varieties using the Petal Length and Petal Width.
Iris Setosa Iris Versicolor Iris Virginica
Setosa
Versicolor
Virginica