center for big data analytics and discovery informatics artificial … · 2019. 9. 11. · center...
TRANSCRIPT
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Machine LearningVasant Honavar
Artificial Intelligence Research LaboratoryInformatics Graduate Program
Computer Science and Engineering Graduate ProgramBioinformatics and Genomics Graduate Program
Neuroscience Graduate ProgramData Sciences Undergraduate Program
Center for Big Data Analytics and Discovery InformaticsHuck Institutes of the Life Sciences
Institute for CyberscienceClinical and Translational Sciences Institute
Northeast Big Data HubPennsylvania State University
[email protected]://faculty.ist.psu.edu/vhonavar
http://ailab.ist.psu.edu
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar41
Now, on to some real content …
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
• How would you write a program to distinguish a picture of youfrom a picture of someone else?– Provide examples pictures of you and pictures of other
people and let a classifier learn to distinguish the two.• How would you write a program to determine whether a
sentence is grammatical or not?– Provide examples of grammatical and ungrammatical
sentences and let a classifier learn to distinguish the two.• How would you write a program to distinguish cancerous cells
from normal cells?– Provide examples of cancerous and normal cells and let a
classifier learn to distinguish the two.
Classification
42
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Example: To play or not to play tennis
43
Example dataset
Three key elements• Class label (“label”, denoted by y)• Features (“attributes”)• Feature values (“attribute values”, denoted by x)Feature values can be binary, nominal or continuous
A labeled dataset is a collection of (x, y) pairs
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Example: To play or not to play tennis?
44
Example dataset
Task:
Predict the class of this “test” sample Requires us to generalize from the training data
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
What is a good representation for images?
Example (face recognition)
45
Pixel values? Edges?
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Example (chair detection)
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Ingredients for classification
47
Idea: Incorporate your knowledge of the problem into a learning systemSources of knowledge:1. Feature representation
2. Training data: labeled examples
3. Model
• Important for success of machine learning• Can be “problem specific”• Designing a good representation will take you half way
• High quality labeled data can be hard to find• Often have to make do with the available data
• No single learning algorithm outperforms all others on every task (“no free lunch”)
• Different learning algorithms come with different inductive biases
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Nearest Neighbor Classifiers• Basic idea:
– If it walks like a duck, quacks like a duck, then it’s probably a duck
Training Samples
Test SampleCompute Distance
Choose k of the “nearest” labeled samples
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Nearest-Neighbor Classifiers● Requires three things
– The set of stored training samples and their labels
– Distance Metric to compute distance between samples
– The value of K, the number of nearest neighbors to retrieve
● To classify a query sample:– Compute distance to training
samples– Identify K nearest neighbors – Use class labels of the K nearest
neighbors to determine the class label of the query sample (e.g., by taking majority vote)
Unknown recordQuery sample
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Definition of Nearest Neighbor
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a sample x are data points that have the k smallest distance to x
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
K nearest neighbor classifierData samples are assumed to lie in an n-dimensional space
– e.g., the Euclidean space An instance X is described by a feature vector
Where xip denotes the value of the ith feature in Xp
( ) ( ) ÷÷ø
öççè
æ-= å
=
N
iiriprp xxXXd
1
2,
Defines the Euclidean distance between two points in the Euclidean space
𝑋"=[𝑥$" ⋯𝑥&"]
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Standardization
Standardization can be important when the variables are not all measured on the same scale• 0-1 scaling
4, 3, 1 2e.g. 3 à (3-min)/(max-min)=(3-1)/(4-1)=2/3
• Z-score scaling: subtract out the mean, divide by std. deviation
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
K nearest neighbor ClassifierLearning Phase
For each training example (Xi, f (Xi)), store the example in memory
Classification phase Given a query instance Xq, identify the k nearest neighbors
X1…Xk of Xq
Assign Xq the label of the majority class
where
( ) ( ) ., andiff , 11 =d==d bab aba
𝑔 𝑋( = 𝑎𝑟𝑔𝑚𝑎𝑥𝜔 .
/0$
1
𝛿 𝜔, 𝑓 𝑥/
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Distance weighted K nearest neighbor Classifier
Learning PhaseFor each training example (Xi, f (Xi)), store the example in memory
Classification phase Given a query instance Xq, identify the k nearest neighbors
of Xq - KNN (Xq)= {X1…Xk}
And obtain a weighted vote, with each nearest neighbor getting a vote in favor of its class label that is weighted by the distance to the query
( )21
qii XXdw
,=
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Distance Measures• Distance– Depends on the data representation – Distance measure chosen
ID Gender Age Salary1 F 27 19,0002 M 51 64,0003 M 52 100,0004 F 33 55,0005 M 45 45,000
w1 w2 w3 w4 w5 w6Doc1 0 4 0 0 0 2Doc2 3 1 4 3 1 2Doc3 3 0 0 0 3 0Doc4 0 1 0 3 0 0Doc5 2 2 2 3 1 4
An Employee DB Word Frequencies for Documents
Representation has to be chosen with some careDistance measure should be chosen to work with the representation
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Distance measures
• d (p, q) between two points p and q is a proper distance measure if it satisfies:1. Positive definiteness:
d (p, q) ³ 0 for all p and q and d (p, q) = 0 only if p = q.
2. Symmetry: d (p, q) = d (q, p) for all p and q.3. Triangle Inequality:
d (p, r) £ d (p, q) + d (q, r) for all points p, q, and r.
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Cosine Distance• If d1 and d2 are two document vectors, then
1-cos( d1, d2 ) = 1-(d1 • d2) / ||d1|| ||d2|| ,where • indicates vector dot product and || d || is the
length of vector d.
• Example:d1 = 3 2 0 5 0 0 0 2 0 0 d2 = 1 0 0 0 0 0 0 1 0 2
d1 • d2= 3x1 + 2x0 + 0x0 + 5x0 + 0x0 + 0x0 + 0x0 + 2x1 + 0x0 + 0x2 = 5||d1|| = (3x3+2x2+0x0+5x5+0x0+0x0+0x0+2x2+0x0+0x0)0.5 = (42) 0.5 = 6.481||d2|| = (6) 0.5 = 2.245
cos( d1, d2 ) = .3150
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Distance MeasuresDistances in vector spaces
• Euclidean distance ∑60$7 𝑝6 − 𝑞6;
• Minkowski distance – a generalization of Euclidean distance
–< ∑60$7 𝑝6 − 𝑞6
=
Distance measures in Boolean spaces• n=1 Manhattan distance• n=2 Euclidean distance
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Distance measures for nominal attributes
• Nominal attributes can take 2 or more values, e.g., red, yellow, blue, green (generalization of a binary attribute)
• Simple matching – distance between two objects is simply the number of mismatched attributes divided by the total number of attributes
• One hot encoding – Encode each M-valued nominal attribute an M-bit vectorRed: 1 0 0 0, Yellow: 0 1 0 0; Blue: 0 0 1 0 …
• Use distance measures designed for vectors …
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Decision Boundary induced by the 1 nearest neighbor classifier
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Nearest Neighbor Classification…
• Choosing the value of k:– If k is too small, sensitive to noise points– If k is too large, neighborhood may include points
from other classes
X
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
• Nearest neighbor classifiers are conceptually simple• Learn by simply memorizing training examples• The computational effort of learning is low• The storage requirements of learning is high
– need to memorize the examples in the training set• Cost of classifying new instances can be high
– Use efficient data structures and algorithms for nearest neighbor search, k-d trees, e.g., locality sensitive hashing
• A distance measure needs to be defined over the input space• Performance degrades when there are many irrelevant
attributes: – Perform feature selection or dimensionality reduction
Nearest neighbor classifiers
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
• Regression is like classification except the labels are real valued
• Example applications:
Regression or Function approximation
63
Predicting• Stock value• Income • Power consumption
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
K nearest neighbor Function ApproximatorLearning Phase
For each training example (Xi, f (Xi)), store the example in memory
Approximation phase Given a query instance Xq, identify the k nearest neighbors
X1…Xk of Xq
( )K
XfXg
i
K
lq
)(å=¬ 1
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Regression• For classification the output is nominal• In regression the output is continuous– Function Approximation
• Linear regression is perhaps the simplest approach– Fit data with the best hyper-plane which "goes through" the
points
ydependent
variable(output)
x – independent variable (input)
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Regression• For classification the output(s) is nominal• In regression the output is continuous– Function Approximation
• Many models could be used – Simplest is linear regression– Fit data with the best hyper-plane which "goes through" the
points
ydependent
variable(output)
x – independent variable (input)
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Simple Linear Regression• For now, assume just one (input) independent variable x, and one
(output) dependent variable y– Multiple linear regression assumes an input vector x– Multivariate linear regression assumes an output vector y
• We will "fit" the points with a linear hyper-plane (line in the simplest case)
• Which line should we use?– Choose an objective function– For simple linear regression we choose sum squared error (SSE)• S (di – yi)2 =S (ei)2
– Thus, find the line which minimizes the sum of the squared residues (e.g. least squares)
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Digression – Minimizing / Maximizing Functions
( )( )
( ) ( )( )
( )( ) ( )
( ) ( )d
ede
<"
<->$>=
>Î"Í$=
Î"Í
®
x-ax
AxfAxf
XfxfUxXDUXxxf
xfXfXf
DXXDDxfDxxf
ax
aa
xa
x
x
such that
such that 0 0,any for if, limsay that We
,such that around odneighborho if at minimum local a has
ofgraph the above lies and points thejoining chord the
, , if domain -sub someover convex is domain with iablescalar var a offunction a ,Consider
21
21
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Minimizing/Maximizing Functions
( )( ){ } ( ){ }
( )( ) ( )
( )
minimum local a or maximum local a is if
as defined is function the of derivative The
lim lim if at continuous is thatsay We
00
0
0
0
lim
limlim
0
Xdxdf
xxfxxf
dxdf
xf
xfxfaxxf
Xx
x
axax
=
D-D+
=
==
=
®D
Î-®®Î+®® ee
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Minimizing/Maximizing Functions
( )
( )
2vdxdvu
dxduv
dxvud
dxduv
dxdvu
dxuvd
dxdv
dxdu
dxvud
÷øö
çèæ-÷
øö
çèæ
=÷øö
çèæ
+=
+=+
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Examples𝑓 𝑥 = 𝑥; + 3𝑥
𝑑(𝑢 + 𝑣)𝑑𝑥 =
𝑑𝑢𝑑𝑥 +
𝑑𝑣𝑑𝑥
𝑑𝑓𝑑𝑥 =
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Examples𝑓 𝑥 = 𝑥 𝑥 + 3
𝑑(𝑢𝑣)𝑑𝑥 = 𝑢
𝑑𝑣𝑑𝑥 + 𝑣
𝑑𝑢𝑑𝑥
𝑑𝑓𝑑𝑥 =
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Examples
𝑓 𝑥 =𝑥(𝑥 + 3)𝑥;
𝑑 𝑢𝑣
𝑑𝑥 =𝑣 𝑑𝑢𝑑𝑥 − 𝑢 𝑑𝑣
𝑑𝑥𝑣;
𝑑𝑓𝑑𝑥 =
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Partial derivatives and chain rule
( ) ( )
( )( )
÷÷ø
öççè
涶
÷÷ø
öççè
涶
=¶¶
"
==
¹¶¶
=
å= k
im
i ik
nii
m
ii
n
xu
uz
xzk
xxxfu....uuφz
jixxf
xxxxff
1
1,0
1
21,0
Then
......Let Let
ruleChain
constant. as all gby treatin obtained is
,.....,Let X
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Example
• 𝑧 = 𝑓 𝑢, 𝑣 = 𝑢; + 2𝑣• 𝑢 = 𝑓$ 𝑥, 𝑦 = 2𝑥+y• 𝑣 = 𝑓; 𝑥, 𝑦 = 𝑥; + 𝑦
• HIHJ =
HIHK
HKHJ +
HIHL
HLHJ
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Taylor Series Approximation of Functions
( )( )
( )
( ) ( ) ( ) ( )
( ) ( ) ( )00
000
0
02
2
0
00
!1.....
then, of odneighborho in the continuous is
and at exist ... , ,
sderivative its i.e., abledifferenti is If ofion approximat seriesTaylor
XxdxdfXfxf
Xxdxdf
nXx
dxdfXfxf
Xxxf
Xxdxfd
dxdf
dxd
dxfd
dxdf
xfxf
Xx
n
Xxn
n
Xx
n
n
-÷÷ø
öççè
æ+»
-÷÷
ø
ö
çç
è
æ++-÷
÷ø
öççè
æ+=
=
=÷øö
çèæ=
=
==
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Example
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Taylor Series Approximation of Multivariate Functions
( ) ( )
( )
( ) ( ) ( )00
0
02010,000
21,0
0
Then ,.....,
at continuous and abledifferenti be ,.....,Let
ii
n
i
n
n
xxxfff
xxxx
xxxxff
-÷øö
çè涶
+»
=
=
==å
XX
XX
X
X
Center for Big Data Analytics and Discovery InformaticsArtificial Intelligence Research Laboratory
Fall 2019 Vasant G Honavar
Minimizing / Maximizing Multivariate Functions
( )( )
small)mally infinitesi(ideally small for
(why?) .............,
at evaluated of gradient negative the of direction the in guess current change we, minimizes that find To *
Cn
CC
C
C
xf
xf
xfη
ff
XX
XX
XXXXX
=÷÷ø
öççè
涶
¶¶
¶¶
-¬10