uva cs 6316/4501 – fall 2016 machine learning lecture 3 ... · – fall 2016 machine learning...
TRANSCRIPT
UVACS6316/4501–Fall2016
MachineLearning
Lecture3:LinearRegression
Dr.YanjunQi
UniversityofVirginia
DepartmentofComputerScience
9/1/16
Dr.YanjunQi/UVACS6316/f16
1
HW1OUT/DUENEXTSAT
9/1/16
Dr.YanjunQi/UVACS6316/f16
2
Wherearewe?èFivemajorsecGonsofthiscourse
q Regression(supervised)q ClassificaGon(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels
9/1/16 3
Dr.YanjunQi/UVACS6316/f16
TodayèRegression(supervised)
q Fourwaystotrain/performopGmizaGonforlinearregressionmodelsq NormalEquaGonq GradientDescent(GD)q StochasGcGDq Newton’smethod
q Supervisedregressionmodels
q Linearregression(LR)q LRwithnon-linearbasisfuncGonsq LocallyweightedLRq LRwithRegularizaGons
9/1/16 4
Dr.YanjunQi/UVACS6316/f16
Today
q Linearregression(akaleastsquares)q LearntoderivetheleastsquaresesGmatebynormalequaGon
q EvaluaGonwithCross-validaGon
9/1/16 5
Dr.YanjunQi/UVACS6316/f16
ADatasetforregression
• Data/points/instances/examples/samples/records:[rows]• Features/a0ributes/dimensions/independentvariables/covariates/
predictors/regressors:[columns,exceptthelast]• Target/outcome/response/label/dependentvariable:special
columntobepredicted[lastcolumn]
9/1/16 6
conGnuousvaluedvariable
Dr.YanjunQi/UVACS6316/f16
7
ForExample,MachinelearningforapartmenthunGng
• Nowyou'vemovedtoCharlocesville!!AndyouwanttofindthemostreasonablypricedapartmentsaGsfyingyourneeds:
square-e.,#ofbedroom,distancetocampus…
Living area (ft2) # bedroom Rent ($)
230 1 600
506 2 1000
433 2 1100
109 1 500
…
150 1 ?
270 1.5 ? 9/1/16
Dr.YanjunQi/UVACS6316/f16
ForExample,MachinelearningforapartmenthunGng
9/1/16 8
Living area (ft2) # bedroom Rent ($)
230 1 600
506 2 1000
433 2 1100
109 1 500
…
150 1 ?
270 1.5 ?
Dr.YanjunQi/UVACS6316/f16
Linear SUPERVISED Regression
e.g. Linear Regression Models
9/1/16 9
y = f (x)=θ0 +θ1x1 +θ2x
2
=>Featuresx:Livingarea,distancetocampus,#bedroom…
=>Targety:RentèConGnuous
Dr.YanjunQi/UVACS6316/f16
Rememberthis:�Linear�?(1Dcase)
• y=mx+b?
b
m
Aslopeof2(i.e.m=2)meansthatevery1-unitchangeinXyieldsa2-unitchangeinY.
9/1/16
Dr.YanjunQi/UVACS6316/f16
10
y
x
9/1/16
11
rent
Livingarea
rent
Livingarea
LocaRon
Dr.YanjunQi/UVACS6316/f16
Review:SpecialUsesforMatrixMulRplicaRon
• Dot(orInner)ProductoftwoVectors<x,y>
whichisthesumofproductsofelementsinsimilarposiGonsforthetwovectors
9/1/16 12
<x,y>=<y,x>
Dr.YanjunQi/UVACS6316/f16
Where<x,y>=
aTb = bTa
13
AnewrepresentaGon(forsinglesample)
• Assumethateachsamplexisacolumnvector,– Hereweassumeapseudo"feature"x0=1(thisistheinterceptterm),andRE-definethefeaturevectortobe:
– theparametervectorisalsoacolumnvector
9/1/16
Dr.YanjunQi/UVACS6316/f16
xT=[x0,x1, x2, … xp-1]
θ =
θ0
θ1
!θ p−1
"
#
$$$$$
%
&
'''''
θ
!!!
y = f (x)= xTθ =θTx
9/1/16
Dr.YanjunQi/UVACS6316/f16
14
!!! y = f (x)=θ0+θ1x1+θ2x
2+ ...+θp−1x p−1
15
Training/learningproblem• NowrepresentthewholeTrainingset(withnsamples)asmatrixform:
X =
−− x1T −−
−− x2T −−
! ! !−− xn
T −−
"
#
$$$$$
%
&
'''''
=
x10 x1
1 … x1p−1
x20 x2
1 … x2p−1
! ! ! !xn0 xn
1 … xnp−1
"
#
$$$$$
%
&
'''''
Y =
y1y2!yn
!
"
#####
$
%
&&&&&
9/1/16
Dr.YanjunQi/UVACS6316/f16
REVIEW:SpecialUsesforMatrixMulGplicaGon
• Matrix-VectorProducts(I)
9/1/16
Dr.YanjunQi/UVACS6316/f16
16
17
Training/learningproblem• Representasmatrixform:– Predictedoutput
– Labels(givenoutputvalue)
9/1/16
Dr.YanjunQi/UVACS6316/f16
!!!
f (x1T )f (x2T )!
f (xnT )
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
=
x1Tθ
x2Tθ
!xnTθ
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
= Xθ = ⌢yY = X✓ =
Y =
y1y2!yn
!
"
#####
$
%
&&&&&
Training/learninggoal
• Usingmatrixform,wegetthefollowinggeneralrepresentaGonofthelinearregressionfuncGon:
• OurgoalistopicktheopGmalthatminimizethefollowingcostfuncGon:
9/1/16
Dr.YanjunQi/UVACS6316/f16
18
θ
!!!J(θ )= 12 ( f (x i )− yi )2
i=1
n
∑
Y = X✓
SSE:Sumofsquarederror
9/1/16
19
rent
Livingarea
rent
Livingarea
LocaRon
Dr.YanjunQi/UVACS6316/f16
Today
q Linearregression(akaleastsquares)q LearntoderivetheleastsquaresesGmatebyNormalEquaGon
q EvaluaGonwithCross-validaGon
9/1/16 20
Dr.YanjunQi/UVACS6316/f16
21
MethodI:normalequaGons• WritethecostfuncGoninmatrixform:
TominimizeJ(θ),takederivaGveandsettozero:
( ) ( )
( )yyXyyXXX
yXyX
yJ
TTTTTT
T
n
ii
Ti
!!!!
!!
+−−=
−−=
−= ∑=
θθθθ
θθ
θθ
212121
1
2)()( x
yXXX TT !=⇒ θ ThenormalequaRons
!! θ* = XTX( )−1 XT !y
⇓
9/1/16
Dr.YanjunQi/UVACS6316/f16
X =
−− x1T −−
−− x2T −−
! ! !−− xn
T −−
"
#
$$$$$
%
&
'''''
Y =
y1y2!yn
!
"
#####
$
%
&&&&&
Review:SpecialUsesforMatrixMulGplicaGon
• SumtheSquaredElementsofaVectorèL2norm• PremulGplyacolumnvectorabyitstranspose–If
thenpremulGplicaGonbyarowvectoraT
willyieldthesumofthesquaredvaluesofelementsfora,i.e.
⎡ ⎤⎢ ⎥=⎢ ⎥⎣ ⎦
52 8
a
aT = 5 2 8!"
#$
aTa = 5 2 8!"
#$
528
!
"
%%
#
$
&& = 52 +22 + 82 = 93
9/1/16 22
Dr.YanjunQi/UVACS6316/f16
|a|22 =
J(θ )= 12 (x iTθ − yi )2i=1
n
∑
9/1/16 23
Dr.YanjunQi/UVACS6316/f16
J(θ )= 12 (x iTθ − yi )2i=1
n
∑
9/1/16
Dr.YanjunQi/UVACS6316/f16
24
-3
-2
-10
1
2
3
4
5
6
-3 -2 -1 1 2 3x
2 3y x= −
( ) ( )2 2
0
3 3limh
x h xy
h→
+ − − −′ =
2 2 2
0
2limh
x xh h xy
h→
+ + −′ =
2y x′ =-6-5-4-3-2-10
123456
-3 -2 -1 1 2 3x
0lim2h
y x h→
′ = +0
Review:DerivaGveofaQuadraGcFuncGon
9/1/16 25
Dr.YanjunQi/UVACS6316/f16
ThisconvexfuncGonisminimized@theuniquepointwhosederivaGve(slope)iszero.èIffindingzerosofthederivaGveofthisfuncGon,wecanalsofindminima(ormaxima)ofthatfuncGon. y00 = 2
Review:ConvexfuncGon• IntuiGvely,aconvexfuncGon(1Dcase)hasasinglepointatwhichthederivaGvegoestozero,andthispointisaminimum.
• IntuiGvely,afuncGonf(1Dcase)isconvexontherange[a,b]ifafuncGon’ssecondderivaGveisposiGveevery-whereinthatrange.
• IntuiGvely,ifafuncGon'sHessiansispsd(posiGvesemi-definite!),this(mulGvariate)funcGonisConvex– IntuiGvely,wecanthink“PosiGvedefinite”matricesasanalogytoposiGvenumbersinmatrixcase
9/1/16
Dr.YanjunQi/UVACS6316/f16
26
Review:posiGvesemi-definite!
9/1/16
Dr.YanjunQi/UVACS6316/f16
27
Extra:Hessian
9/1/16
Dr.YanjunQi/UVACS6316/f16
28
Review:MatrixCalculus:TypesofMatrixDerivaRves
ByThomasMinka.OldandNewMatrixAlgebraUsefulforStaGsGcs
9/1/16 29
Dr.YanjunQi/UVACS6316/f16
Extra:EigenvaluesofHessianèCurvature
9/1/16
Dr.YanjunQi/UVACS6316/f16
30
PosiGveDefiniteHessian NegaGveDefiniteHessian
Review:SomeimportantrulesfortakingderivaGves
9/1/16
Dr.YanjunQi/UVACS6316/f16
31
Review:Someimportantrulesfortakinggradientandhessian
9/1/16
Dr.YanjunQi/UVACS6316/f16
32
9/1/16
Dr.YanjunQi/UVACS6316/f16
33
34
CommentsonthenormalequaGon• InmostsituaGonsofpracGcalinterest,thenumberofdatapointsnislargerthanthedimensionalitypoftheinputspaceandthematrixXisoffullcolumnrank.IfthiscondiGonholds,thenitiseasytoverifythatXTXisnecessarilyinverGble.
• TheassumpGonthatXTXisinverGbleimpliesthatitisposiGvedefinite,thusthecriGcalpointwehavefoundisaminimum.
• WhatifXhaslessthanfullcolumnrank?àregularizaGon(later).
9/1/16
Dr.YanjunQi/UVACS6316/f16
Scalabilitytobig?
• TradiGonalCSview:PolynomialGmealgorithm,Wow!
• Large-scalelearning:SomeGmesevenO(n)isbad!
9/1/16
Dr.YanjunQi/UVACS6316/f16
35
Today
q Linearregression(akaleastsquares)q LearntoderivetheleastsquaresesGmatebyopGmizaGon
q EvaluaGonwithTrain/TestORk-foldsCross-validaGon
9/1/16 36
Dr.YanjunQi/UVACS6316/f16
TYPICAL MACHINE LEARNING SYSTEM
9/1/16
Low-level sensing
Pre-processing
Feature Extract
Feature Select
Inference, Prediction, Recognition
Label Collection
37
Evaluation
Dr.YanjunQi/UVACS6316/f16
Evaluation Choice-I: Train and Test
9/1/16 38f(x?)
Trainingdatasetconsistsofinput-outputpairs
Evaluation
Measure Loss on pair è ( f(x?), y? )
Dr.YanjunQi/UVACS6316/f16
39
Evaluation Choice-I: e.g. for supervised classificationü Training(Learning):Learnamodelusingthetrainingdata
ü TesGng:Testthemodelusingunseentestdatatoassessthemodelaccuracy
,cases test ofnumber Total
tionsclassificacorrect ofNumber =Accuracy9/1/16
Dr.YanjunQi/UVACS6316/f16
Evaluation Choice-I: e.g. for linear regression models
9/1/16 40
Xtrain =
−− x1T −−
−− x2T −−
! ! !−− xn
T −−
"
#
$$$$$
%
&
'''''
!ytrain =
y1y2"yn
!
"
#####
$
%
&&&&&
Xtest =
−− xn+1T −−
−− xn+2T −−
! ! !−− xn+m
T −−
"
#
$$$$$
%
&
'''''
!ytest =
yn+1yn+2"yn+m
!
"
#####
$
%
&&&&&
Dr.YanjunQi/UVACS6316/f16
Evaluation Choice-I: e.g. for linear regression models • TrainingSSE(sumofsquarederror):
9/1/16 41
Jtrain (θ ) =12
(xiTθ − yi )
2
i=1
n
∑
θ * = argmin Jtrain (θ ) = XtrainT Xtrain( )
−1XtrainT !ytrain
• MinimizeJtrain(θ) è Normal Equation to get
Dr.YanjunQi/UVACS6316/f16
9/1/16 42
• TesGngMSEErrortoreport:
Evaluation Choice-I: e.g. for RegressionModels
!!!Jtest =
1m
(x iTθ * − yi )2i=n+1
n+m
∑
Dr.YanjunQi/UVACS6316/f16
Evaluation Choice-II: CrossValidaGon
• Problem:don’thaveenoughdatatosetasideatestset• SoluGon:Eachdatapointisusedbothastrainandtest• Commontypes:-K-foldcross-validaGon(e.g.K=5,K=10)-2-foldcross-validaGon-Leave-one-outcross-validaGon(LOOCV,i.e.,
k=n_reference) 9/1/16
Dr.YanjunQi/UVACS6316/f16
K-foldCrossValidaGon
• Basicidea:-SplitthewholedatatoNpieces;-N-1piecesforfitmodel;1fortest;-CyclethroughallNcases;-K=10“folds”acommonruleofthumb.
• Theadvantage:-allpiecesareusedforbothtrainingandvalidaGon;-eachobservaGonisusedforvalidaGonexactlyonce.
9/1/16
Dr.YanjunQi/UVACS6316/f16
e.g.10foldCrossValidaGonmodel P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
1 train train train train train train train train train test
2 train train train train train train train train test train
3 train train train train train train train test train train
4 train train train train train train test train train train
5 train train train train train test train train train train
6 train train train train test train train train train train
7 train train train test train train train train train train
8 train train test train train train train train train train
9 train test train train train train train train train train
10 test train train train train train train train train train
• Dividedatainto10equalpieces
• 9piecesastrainingset,therest1astestset
• Collectthescoresfromthediagonal
• Wenormallyusethemeanofthescores 9/1/16
Dr.YanjunQi/UVACS6316/f16
e.g.Leave-one-out/LOOCV(n-foldcrossvalidaGon)
9/1/16 46
Dr.YanjunQi/UVACS6316/f16
TodayRecap
q Linearregression(akaleastsquares)q LearntoderivetheleastsquaresesGmatebynormalequaGon
q EvaluaGonwithTrain/TestORk-foldsCross-validaGon
9/1/16 47
Dr.YanjunQi/UVACS6316/f16
References
• BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides
q hcp://www.cs.cmu.edu/~zkolter/course/15-884/linalg-review.pdf(pleaseread)
q Prof.AlexanderGray’sslides
9/1/16 48
Dr.YanjunQi/UVACS6316/f16