uva cs 4501: machine learning lecture 5: non …...rbf= radial-basis function:a function which...
Post on 20-May-2020
6 Views
Preview:
TRANSCRIPT
UVACS4501:MachineLearning
Lecture5:Non-LinearRegressionModels
Dr.Yanjun Qi
UniversityofVirginiaDepartmentofComputerScience
10/6/18
Dr.YanjunQi/UVACS
1
Wherearewe?èFivemajorsectionsofthiscourse
q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels
10/6/18 2
Dr.YanjunQi/UVACS
Regression(supervised)q Fourwaystotrain/performoptimizationforlinearregressionmodelsq NormalEquationq GradientDescent(GD)q StochasticGDq Newton’smethod
qSupervisedregressionmodelsqLinearregression(LR)qLRwithnon-linearbasisfunctionsqLocallyweightedLRqLRwithRegularizations
10/6/18 3
Dr.YanjunQi/UVACS
TodayèRegression(supervised)
q Fourwaystotrain/performoptimizationforlinearregressionmodelsq NormalEquationq GradientDescent(GD)q StochasticGDq Newton’smethod
qSupervisedregressionmodelsqLinearregression(LR)qLRwithnon-linearbasisfunctionsqLocallyweightedLRqLRwithRegularizations
10/6/18 4
Dr.YanjunQi/UVACS
Today
q RegressionModelsBeyondLinear– LRwithnon-linearbasisfunctions– Instance-basedRegression:K-NearestNeighbors(later)
– Locallyweightedlinearregression(extra)–RegressiontreesandMultilinear Interpolation(later)
10/6/18 5
Dr.YanjunQi/UVACS
Dr.YanjunQi/UVACS
6
LRwithnon-linearbasisfunctions
• LRdoesnotmeanwecanonlydealwithlinearrelationships
y =θ0 + θ jϕ j(x)j=1m∑ =θTϕ(x)
10/6/18
y =θ Tx
Dr.YanjunQi/UVACS
7
LRwithnon-linearbasisfunctions
• Wearefreetodesignbasisfunctions(e.g.,non-linearfeatures:
Herearefixedbasisfunctions(alsodefine)
• E.g.:polynomialregression:
ϕ(x):= 1,x ,x2⎡⎣ ⎤⎦T
10/6/18
!!ϕ j(x) !!ϕ0(x)=1
e.g.(1)polynomialregression
10/6/18
Dr.YanjunQi/UVACS
8
θ * = ϕTϕ( )−1ϕT !y( ) yXXX TT !1−=*θ
y =θTϕ(x)y =θ Tx
ϕ(x):= 1,x ,x2⎡⎣ ⎤⎦T
e.g.(1)polynomialregression
10/6/18
Dr.YanjunQi/UVACS
9
KEY:ifthebasesaregiven,theproblemof
learningtheparameters isstilllinear.
y =θTϕ(x)y =θ Tx
Dr.YanjunQi/UVACS
10
ManyPossibleBasisfunctions• Therearemanybasisfunctions,e.g.:
– Polynomial
– Radialbasisfunctions
– Sigmoidal
– Splines,– Fourier,– Wavelets,etc
ϕ j (x) = xj−1
( )⎟⎟⎠
⎞⎜⎜⎝
⎛ −−= 2
2
2sx
x jj
µφ exp)(
⎟⎟⎠
⎞⎜⎜⎝
⎛ −=
sx
x jj
µσφ )(
10/6/18
ManyPossibleBasisfunctions
10/6/18
Dr.YanjunQi/UVACS
11
Dr.YanjunQi/UVACS
12
e.g.(2)LRwithradial-basisfunctions
• E.g.:LRwithRBFregression:
!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ
10/6/18
!!ϕ(x):= 1,Kλ1(x ,r1),Kλ2(x ,r2),Kλ3
(x ,r3),Kλ4(x ,r4 )⎡⎣
⎤⎦T
θ * = ϕTϕ( )−1ϕT !y
Kλ(x ,r)= exp − (x − r)
2
2λ2⎛
⎝⎜⎞
⎠⎟
RBF = radial-basisfunction: afunctionwhichdependsonlyontheradial distancefromacentre point
GaussianRBFè
asdistance fromthecenterr increases, theoutputoftheRBFdecreases
1Dcase 2Dcase
10/6/18
Dr.YanjunQi/UVACS
13
10/6/18
Dr.YanjunQi/UVACS
14
Kλ(x ,r)= exp − (x − r)
2
2λ2⎛
⎝⎜⎞
⎠⎟
X=
10.6065307
0.1353353
0.0001234098
Kλ(x ,r)=r
r +λ
r +2λr +3λ
e.g.anotherLinearregressionwith1DRBFbasisfunctions
(assuming3predefinedcentres andwidth)
10/6/18
Dr.YanjunQi/UVACS
15
ϕ(x):= 1,Kλ1(x ,r1),Kλ2(x ,r2),Kλ3
(x ,r3)⎡⎣
⎤⎦T
θ * = ϕTϕ( )−1ϕT !y
Dr.YanjunQi/UVACS
16
e.g.aLRwith1DRBFs(3predefinedcentres andwidth)
• 1DRBF
• Afterfit:
10/6/18
e.g.EvenmorepossibleBasisFunc?
10/6/18
Dr.YanjunQi/UVACS
17
(2) Multivariate Linear Regression with basis Expansion
Regression
Y = Weighted linear sum of (X basis expansion)
SSE
Linear algebra
Regression coefficients
Task
Representation
Score Function
Search/Optimization
Models, Parameters
!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ
10/6/18 18
Dr.YanjunQi/UVACS
Twomainissues:
• ToLearntheparameter– AlmostthesameasLR,justè Xto– Linearcombinationofbasisfunctions(thatcanbenon-linear)
• Howtochoosethemodelorder,– E.g.whatpolynomialdegreeforpolynomialregression
– E.g.,wheretoputthecentersfortheRBFkernels?Howwide?
10/6/18
Dr.YanjunQi/UVACS
19
ϕ(x)θ *
Dr.YanjunQi/UVACS
20
e.g.2DGoodandBadRBFBasis
• Agood2DRBF
• Twobad2DRBFs
10/6/18
Issue:OverfittingandUnderfitting
x
y
y=f(x)+noiseCanwelearnaregression ffromthedata?
Let’sconsiderthreemethods…
LinearRegression
x
y
QuadraticRegression
x
y
Join-the-dots
x
y
Alsoknownaspiecewiselinearnonparametric regression ifthatmakesyoufeelbetter
Whichisbest?
x
y
x
y
Whynotchoosethemethodwiththebestfittothedata?
Whatdowereallywant?
x
y
x
y
Whynotchoosethemethodwiththebestfittothedata?
“Howwellareyougoing topredictfuturedatadrawnfromthesamedistribution?”
Whatdowereallywant?
x
y
x
y
Whynotchoosethemethodwiththebestfittothedata?
“Howwellareyougoing topredictfuturedatadrawnfromthesamedistribution?”
Underfit Good? Overfit
Dr.YanjunQi/UVACS
28
Issue:Overfitting andunderfitting
xy 10 θθ += 2210 xxy θθθ ++= ∑ =
= 5
0jj
j xy θ
10/6/18
K-foldCrossValidation/Train-Test/
Generalisation:learnfunction/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”new dataexamples
Underfit Looksgood Overfit
Thetestsetmethod
x
y
1.Randomlychoosesomepercentagelike30% ofthelabeleddatatobeinatestset2.Theremainder isatrainingset
Credit:Prof.AndrewMoore
Thetestsetmethod
x
y
1.Randomlychoosesomepercentagelike30% ofthelabeleddatatobeinatestset2.Theremainder isatrainingset3.Performyour regressiononthetrainingset
(Linearregressionexample)
Credit:Prof.AndrewMoore
Thetestsetmethod
x
y
1.Randomlychoose30%ofthedatatobeinatestset2.Theremainder isatrainingset3.Performyour regressiononthetrainingset4.Estimateyour futureperformancewiththetestset
(Linearregressionexample)MeanSquaredError=2.4
Credit:Prof.AndrewMoore
Evaluation: e.g. train / test split as follows
10/6/18 32
Xtrain =
−− x1T −−
−− x2T −−
! ! !−− xn
T −−
"
#
$$$$$
%
&
'''''
!ytrain =
y1y2"yn
!
"
#####
$
%
&&&&&
Xtest =
−− xn+1T −−
−− xn+2T −−
! ! !−− xn+m
T −−
"
#
$$$$$
%
&
'''''
!ytest =
yn+1yn+2"yn+m
!
"
#####
$
%
&&&&&
Dr.YanjunQi/UVACS
10/6/18 33
TestingMSEErrortoreport:
Jtest =1m
(x iTθ * − yi )2i=n+1
n+m
∑ = 1m
ε i2
i=n+1
n+m
∑
Dr.YanjunQi/UVACS
InHomework,whenweaskforplotsof trainingerror,weaskfortheMSEper-sampletrainerrors;BecauseitiscomparabletotestMSEerror.
10/6/18
Dr.YanjunQi/
34
Jtrain−MSE =
1n
(x iTθ * − yi )2i=1
n
∑
• TrainMSEErrortoobserve:
InHomework,whenweaskforplotsof trainingerror,weaskfortheMSEper-sampletrainerrors;BecauseitiscomparabletotestMSEerror.
Inmanysituations,visualizingTrain-MSEcanbehelpful tounderstand thebehaviorofyourmethod, e.g.,theinfluenceofthehyperparameteryouchose……
Thetestsetmethod
x
y
1.Randomlychoose30%ofthedatatobeinatestset2.Theremainder isatrainingset3.Performyour regressiononthetrainingset4.Estimateyour futureperformancewiththetestset
(Quadraticregressionexample)MeanSquaredError=0.9
Credit:Prof.AndrewMoore
Thetestsetmethod
x
y
1.Randomlychoose30%ofthedatatobeinatestset2.Theremainder isatrainingset3.Performyour regressiononthetrainingset4.Estimateyour futureperformancewiththetestset
(Join thedotsexample)MeanSquaredError=2.2
Credit:Prof.AndrewMoore
Thetestsetmethod
Goodnews:•Veryverysimple•Canthensimplychoosethemethodwiththebesttest-setscoreBadnews:•What’sthedownside?
Credit:Prof.AndrewMoore
Thetestsetmethod
Goodnews:•Veryverysimple•Canthensimplychoosethemethodwiththebesttest-setscoreBadnews:•Wastesdata:wegetanestimateofthebestmethodtoapplyto30%lessdata•Ifwedon’thavemuchdata,ourtest-setmightjustbeluckyorunlucky
We say the “test-set estimator of performance has high variance”
Credit:Prof.AndrewMoore
Regression: Complexity versus Goodness of Fit
x
y
x
y
x
y
x
y
Too simple?
Too complex ? About right ?
Training data
What ultimately matters: GENERALIZATION
LowVariance/HighBias
LowBias/HighVariance
10/6/18 39
Dr.YanjunQi/UVACS
LOOCV(Leave-one-outCrossValidation)
x
y
For k=1 to n
1. Let (xk,yk) be the kth record
Credit:Prof.AndrewMoore
LOOCV(Leave-one-outCrossValidation)
x
y
For k=1 to n
1. Let (xk,yk) be the kth record
2. Temporarily remove (xk,yk)from the dataset
LOOCV(Leave-one-outCrossValidation)
x
y
For k=1 to n
1. Let (xk,yk) be the kth record
2. Temporarily remove (xk,yk)from the dataset
3. Train on the remaining R-1 datapoints
Credit:Prof.AndrewMoore
LOOCV(Leave-one-outCrossValidation)For k=1 to n
1. Let (xk,yk) be the kth record
2. Temporarily remove (xk,yk)from the dataset
3. Train on the remaining R-1 datapoints
4. Note your error (xk,yk)
x
y
LOOCV(Leave-one-outCrossValidation)For k=1 to R
1. Let (xk,yk) be the kth record
2. Temporarily remove (xk,yk)from the dataset
3. Train on the remaining R-1 datapoints
4. Note your error (xk,yk)
When you’ve done all points, report the mean error.
x
y
LOOCV(Leave-one-outCrossValidation)For k=1 to n
1. Let (xk,yk) be the kth
record
2. Temporarily remove (xk,yk) from the dataset
3. Train on the remaining R-1 datapoints
4. Note your error (xk,yk)
When you’ve done all points, report the mean error.
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
MSELOOCV=2.12
Credit:Prof.AndrewMoore
LOOCVforQuadraticRegressionFor k=1 to n
1. Let (xk,yk) be the kth record
2. Temporarily remove (xk,yk)from the dataset
3. Train on the remaining R-1 datapoints
4. Note your error (xk,yk)
When you’ve done all points, report the mean error.
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
MSELOOCV=0.962
Credit:Prof.AndrewMoore
LOOCVforJoinTheDotsFor k=1 to n
1. Let (xk,yk) be the kth
record
2. Temporarily remove (xk,yk) from the dataset
3. Train on the remaining R-1 datapoints
4. Note your error (xk,yk)
When you’ve done all points, report the mean error.
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
MSELOOCV=3.33
Credit:Prof.AndrewMoore
WhichkindofCrossValidation?Downside Upside
Test-set Variance: unreliable estimate of future performance
Cheap
Leave-one-out
Expensive. Has some weird behavior
Doesn’t waste data
..canwegetthebestofbothworlds?
Credit:Prof.AndrewMoore
e.g.Byk=10foldCrossValidationmodel P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
1 train train train train train train train train train test
2 train train train train train train train train test train
3 train train train train train train train test train train
4 train train train train train train test train train train
5 train train train train train test train train train train
6 train train train train test train train train train train
7 train train train test train train train train train train
8 train train test train train train train train train train
9 train test train train train train train train train train
10 test train train train train train train train train train
• Dividedatainto10equalpieces
• 9piecesastrainingset,therest1astestset
• Collectthescoresfromthediagonal
• Wenormallyusethemeanofthescores 4910/6/18
Dr.YanjunQi/UVACSMakesurethatthetrain/test/validationfoldsareindeed independent samples.
k-foldCrossValidation
x
y
Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)
Credit:Prof.AndrewMoore
k-foldCrossValidation
x
y
Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)
For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.
Credit:Prof.AndrewMoore
k-foldCrossValidation
x
y
Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)
For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.
For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.
Credit:Prof.AndrewMoore
k-foldCrossValidation
x
y
Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)
For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.
For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.
For the purple partition: Train on all the points not in the purple partition. Find the test-set sum of errors on the purple points.
Credit:Prof.AndrewMoore
k-foldCrossValidation
x
y
Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)
For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.
For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.
For the purple partition: Train on all the points not in the purple partition. Find the test-set sum of errors on the purple points.
Then report the mean error
LinearRegressionMSE3FOLD=2.05
Credit:Prof.AndrewMoore
k-foldCrossValidation
x
y
Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)
For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.
For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.
For the purple partition: Train on all the points not in the purple partition. Find the test-set sum of errors on the purple points.
Then report the mean error
QuadraticRegressionMSE3FOLD=1.11
Credit:Prof.AndrewMoore
k-foldCrossValidation
x
y
Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)
For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.
For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.
For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.
Then report the mean error
Joint-the-dotsMSE3FOLD=2.93
Credit:Prof.AndrewMoore
WhichkindofCrossValidation?Downside Upside
Test-set Variance: unreliable estimate of future performance
Cheap
Leave-one-out
Expensive. Has some weird behavior
Doesn’t waste data
10-fold Wastes 10% of the data. 10 times more expensive than test set
Only wastes 10%. Only 10 times more expensive instead of R times.
3-fold Wastier than 10-fold. Expensivier than test set
better than test-set
n-fold Identical to Leave-one-out
CV-basedModelSelection
• We’retryingtodecidewhichalgorithmtouse.• Wetraineachmachineandmakeatable…
i fi TRAINERR k-FOLD-CV-ERR Choice1 f12 f23 f3 Ö
4 f45 f56 f6
Credit:Prof.AndrewMoore
10/6/18
Dr.YanjunQi/UVACS
59Credit:StanfordMachineLearningcourse
References
• BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides
q Prof.Nando deFreitas’stutorialslideq Prof.AndrewMoore’sslides@CMU
10/6/18 60
Dr.YanjunQi/UVACS
61
Extra: Performancevs.TrainingSizeforanexample
l TheresultsfromBatchGDandOnline(SGD)updatearealmostidentical.Sotheplotscoincide.
l ThetestMSEfromthenormalequationismorethanthatofBGDandO(SGD)during smalltraining.Thisisprobablyduetooverfitting.
l InBandO,sinceonly2000(forexample)iterationsareallowedatmost.Thisroughlyactsasamechanismthatavoidsoverfitting.
10/6/18
Dr.YanjunQi/UVACS
Dr.EricXing’s tutorialslide
EXTRA:MOREREGRESSIONMODELS
10/6/18
Dr.YanjunQi/UVACS
62
ExtraToday
q RegressionModelsBeyondLinear– LRwithnon-linearbasisfunctions– Instance-basedRegression:K-NearestNeighbors(later)
– Locallyweightedlinearregression(extra)–RegressiontreesandMultilinear Interpolation(later)
10/6/18 63
Dr.YanjunQi/UVACS
K-NearestNeighbor
• Features– Allinstancescorrespondtopointsinanp-dimensionalEuclideanspace
– Regressionisdelayedtillanewinstancearrives– Regressionisdonebycomparingfeaturevectorsofthedifferentpoints
– Targetfunctionmaybediscreteorreal-valued• Whentargetiscontinuous,thepredictionisthemeanvalueoftheknearesttrainingexamples
K=5-NearestNeighbor(1Dinput)
10/6/18
Dr.YanjunQi/UVACS
65
K=1-NearestNeighbor(1Dinput)
10/6/18
Dr.YanjunQi/UVACS
66
K-Nearest Neighbor
Regression/ classification
Local Smoothness
NA
NA
Training Samples
Task
Representation
Score Function
Search/Optimization
Models, Parameters
10/6/18 67
Dr.YanjunQi/UVACS6316/f16
Variants:Distance-Weightedk-NearestNeighborAlgorithm
• Assignweightstotheneighborsbasedontheir“distance” fromthequerypoint– Weight“may” beinversesquareofthedistances
• Alltrainingpointsmayinfluenceaparticularinstance– E.g.,Shepard’smethod/ModifiedShepard,… byGeospatialAnalysis
Instance-basedRegressionvs.LinearRegression
• LinearRegressionLearning– Explicitdescriptionoftargetfunctiononthewholetrainingset
• Instance-basedLearning– Learning=storingalltraininginstances– Referredtoas“Lazy” learning
ExtraToday
q RegressionModelsBeyondLinear– LRwithnon-linearbasisfunctions– Instance-basedRegression:K-NearestNeighbors(later)
– Locallyweightedlinearregression(extra)–RegressiontreesandMultilinear Interpolation(later)
10/6/18 70
Dr.YanjunQi/UVACS
71
Locally weighted regression• aka locally weighted regression, local
linear regression, LOESS, …– A combination of kNN and Linear regression
10/6/18
Dr.YanjunQi/UVACS
Kλ (xi, x0 )
72
Locally weighted regression
10/6/18
Dr.YanjunQi/UVACS
UseRBFfunctiontopickout/emphasizetheneighborregionofx_0è
Kλ (xi, x0 )
Kλ (xi, x0 )
73
Locally weighted regression
10/6/18
Dr.YanjunQi/UVACS
Alinear_func(x)->yèOnlytorepresent
theneighborregionofx_0
Kλ (xi, x0 )
f (x0)=θ0!(x0)+θ1
!(x0)x0
Dr.YanjunQi/UVACS
74
Locallyweightedlinearregression
Insteadofminimizing
nowwefittominimize
∑=
−=n
ii
Ti yJ
1
2
21 )()( θθ x
J(θ ) = 12
wi (xiTθ − yi )
2
i=1
n
∑
wi = Kλ(x i ,x0)= exp −
(x i − x0)22λ2
⎛
⎝⎜
⎞
⎠⎟
10/6/18
wherex_0 isthequerypointforwhichwe'dliketoknowitscorrespondingy
Dr.YanjunQi/UVACS
75
Locallyweightedlinearregression
Wefit \thetatominimize
wi comesfrom:
• x_0 isthequerypointforwhichwe'dliketoknowitscorrespondingy
J(θ ) = 12
wi (xiTθ − yi )
2
i=1
n
∑
wi = Kλ(x i ,x0)= exp −
(x i − x0)22λ2
⎛
⎝⎜
⎞
⎠⎟
10/6/18
Essentiallyweputhigherweightsontrainingexamplesthatareclosetothequerypointx_0(thanthosethatarefurtherawayfromthequerypointx_0)
76
Locally weighted linear regression
• ThewidthofRBFmatters!
10/6/18
Dr.YanjunQi/UVACS
LEARNING of Locally weighted linear regression
10/6/18
Dr.YanjunQi/UVACS
x0
77
• Separate weighted least squares training and inference at each target point x0
f (x0)=θ0!(x0)+θ1
!(x0)x0
10/6/18
Dr. Yanjun Qi / UVA CS
78
Locally weighted linear regression
• èSeparate weighted least square error minimization at each target point x0:
f (x0)= x0Tθ *(x0)
θ *(x0)= argmin12 wi(x iTθ(x0)− yi )2
i=1
n
∑
= argmin12 Kλ(xi ,x0)(x iTθ(x0)− yi )2i=1
n
∑
0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=
10/6/18
Dr. Yanjun Qi / UVA CS
79
Extra: Solution of Locally weighted linear/NonLinearBasis regression
( ) NixxKdiagxW iNN ,,1,),()( 00 !==× λ
θ*(x0)= (BTW(x0)B)−1BTW(x0)y
versus LR θ * = XTX( )−1 XT !y
LWR
10/6/18 80
More è Local Weighted Polynomial Regression
• Local polynomial fits of any degree d
∑∑ ∑
=
= ==
+=
⎥⎦
⎤⎢⎣
⎡−−
d
jj
j
N
i
d
j
jijiidjxx
xxxxf
xxxyxxKj
1 0000
1
2
1000,,1),(),(
)(ˆ)(ˆ)(ˆ
)()(),(min00
βα
βαλβα !
Dr.YanjunQi/UVACS
Blue:trueGreen:estimated
Dr.YanjunQi/UVACS
81
Extra:Parametricvs.non-parametric
• Locallyweightedlinearregressionisanon-parametricalgorithm.
• The(unweighted)linearregressionalgorithmthatwesawearlierisknownasaparametric learningalgorithm– becauseithasafixed,finitenumberofparameters(the),which
arefittothedata;– Oncewe'vefitthe\theta andstoredthemaway,wenolongerneed
tokeepthetrainingdataaroundtomakefuturepredictions.– Incontrast,tomakepredictionsusinglocallyweightedlinear
regression,weneedtokeeptheentiretrainingsetaround.
• Theterm"non-parametric"(roughly)referstothefactthattheamountofknowledgeweneedtokeep,inordertorepresentthehypothesisgrowswithlinearlythesizeofthetrainingset.
10/6/18
θ
(3) Locally Weighted / Kernel Linear Regression
Regression
Y = Weighted linear sum of X’s
Weighted SSE
Linear algebra
Local Regression coefficients
(conditioned on each test point)
Task
Representation
Score Function
Search/Optimization
Models, Parameters
0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=min
α (x0 ),β(x0 )Kλ(x0 ,xi )[ yi −α(x0)−β(x0)xi ]2
i=1
N
∑10/6/18 82
Dr.YanjunQi/UVACS
θ*(x0)= (BTW(x0)B)−1BTW(x0)y
top related