csci 5654 (linear programming, fall 2013)srirams/courses/csci5654...csci5654 (linear programming,...
TRANSCRIPT
CSCI5654 (Linear Programming, Fall 2013)Lectures 10-12
Lectures 10,11 Slide# 1
Today’s Lecture
1. Introduction to norms: L1, L2, L∞.
2. Casting absolute value and max operators.
3. Norm minimization problems.
4. Applications: fitting, classification, denoising etc..
Lectures 10,11 Slide# 2
Norm
Basic idea: Notion of length/distance between vector.
Definition (Norm): Let ℓ : X 7→ R≥0 be a mapping from a vector spaceto non-negative reals. The function ℓ is a norm iff
1. ℓ(~x) = 0 iff 0,
2. ℓ(a~x) = |a|ℓ(~x)3. ℓ(~x + ~y) ≤ ℓ(~x) + ℓ(~y) (Triangle Inequality).
“Length” of ~x : ℓ(~x).
“Distance” between ~x , ~y is ℓ(~y − ~x)(= ℓ(~x − ~y)).
Lectures 10,11 Slide# 3
L2 (Euclidean) norm
Distance between (x1, y1) to (x2, y2) is√
(x1 − x2)2 + (y1 − y2)2.
In general, ||~x ||2 =√~x
t
.~x .Distance between ~x , ~y is given by ||~x − ~y ||2.
Lectures 10,11 Slide# 4
Lp norm
For p ≥ 1, we define Lp norm as
||~x ||p = (|x1|p + |x2|p + . . .+ |xn|p)1p .
The Lp norm distance between two vectors is: ||~x − ~y ||pL1 norm: ||~x ||1 = |x1|+ |x2|+ . . .+ |xn|.Class exercise: verify that L1 norm distance is indeed a norm.Q: Why is p ≥ 1 necessary?
Lectures 10,11 Slide# 5
L∞ norm
Obtained as the limit limp→∞ ||~x ||p.Definition: For vector ~x , ||~x ||∞ is defined as
||~x ||∞ = max(|x1|, |x2|, . . . , |xn|) .
Ex-1: Can we verify that L∞ is really a norm?Ex-2 (challenge): Prove that the limit definition coincides with theexplicit form of the L∞ distance.
Lectures 10,11 Slide# 6
Norm: Exercise
Take vectors ~x = (0, 1,−1), ~y = (2, 1, 1).Write down L1, L2, L∞ norm distances between ~x , ~y .
Lectures 10,11 Slide# 7
Norm Spheres
ǫ-Sphere: {~x | d(~x , 0) ≤ ǫ} for norm d . The 1−spheres corresponding tothe norms L1, L∞, L2.
1
L2 L1 L∞
Note: Norm spheres are convex sets.
Lectures 10,11 Slide# 8
Norm Minimization problems
General form:minimize ||x ||p
Ax ≤ b
Note-1: We always minimize the norm functions (in this course).Note-2: Maximization of norm is generally a hard (non-convex) problem.
Lectures 10,11 Slide# 9
Unconstrained Norm Minimization
Lets first study the following problem:
min. ||Ax − b||p .
1. For p = 2, this problem is called (unconstrained) least squares.We will solve this using calculus.
2. For p = 1,∞, this problem is called L1(L∞) least squares.We will reduce this problem to LP.
3. Applications: solving (noisy) linear systems, regularization, denoising,max. likelihood estimation, regression and so on.
Lectures 10,11 Slide# 10
Unconstrained Least Squares
Decision Variables: ~x = (x1, . . . , xn) ∈ Rn.Objective function: min.||A~x − ~b||2.Constraints: No explicit constraint.
Lectures 10,11 Slide# 11
Least Squares: Example
Example: min.||(2x + 3y − 1, y)||2.Solution: We will equivalently minimize (2x + 3y − 1)2 + y2. Usingfundamental theorem of calculus, we obtain the conditions:
∂(2x+3y−1)2+y2
∂x= 0
∂(2x+3y−1)2+y2
∂y= 0
In other words, we obtain:
4(2x + 3y − 1) = 06(2x + 3y − 1) + 2y = 0
The optima lies at is y = 0, x = 12 .
Verify minima by computing checking second order derivative (Hessianmatrix).
Lectures 10,11 Slide# 12
Unconstrained Least Squares
Problem: minimize ||A~x − ~b||2.Solution: We will use calculus minimum finding using partial derivatives.Recall:
∇f (x1, . . . , xn) = (∂x1f , ∂x2f , . . . , ∂xn f ) .
Criterion for optimal point for f (~x) is that ∇f = (0, . . . , 0).
∇(||A~x − ~b||22) = ∇((A~x − ~b)t
(A~x − ~b))
= ∇(~xt
At
A~x − 2~xt
At~b + ~b
t~b)
= 2At
A~x − 2At~b
= 0
Minima will occur at ~x∗ = (At
A)−1At~b (assume A is “full rank”).
Similar solution for Lp norm for even number p.
Lectures 10,11 Slide# 13
L1 norm minimization
Decision Variables: ~x = (x1, . . . , xn) ∈ Rn.Objective function: min.||A~x − ~b||1.Constraints: No explicit constraint.We will reduce this to a linear program.
Lectures 10,11 Slide# 14
Example
Example: min.||(2x + 3y − 1, y)||1 = |2x + 3y − 1|+ |y |.Trick: Let t1 ≥ 0, t2 ≥ 0 be s.t.
|2x + 3y − 1| ≤ t1|y | ≤ t2
Consider the following problem:
min. t1 + t2|2x + 3y − 1| ≤ t1
|y | ≤ t2t1, t2 ≥ 0
Goal: convince class that this problem is equivalent to original.Note: |y | ≤ t2 can be rewritten as y ≤ t2 ∧ −y ≤ t2.
Lectures 10,11 Slide# 15
Example: LP formulation
min. t1 + t22x + 3y − 1 ≤ t1−2x − 3y + 1 ≤ t1
y ≤ t2−y ≤ t2t1, t2 ≥ 0
Solution: x = 12 , y = 0.
Q: Can we repeat the same trick if |y | ≥ t2 ?N: No. A disjunction of inequalities will be needed.
Lectures 10,11 Slide# 16
L1 norm minimizationProblem: min.||Ax − b||1.Solution: Let A be m × n matrix. Add variables t1, . . . , tm correspodingto rows of m.
min.∑m
i=1 ti
|Ai~x − ~b| ≤ tit1, . . . , tm ≥ 0
⇒
min.∑m
i=1 ti
A~x − ~b ≤ ~t
−A~x + ~b ≤ ~t
t1, . . . , tm ≥ 0
We write it in the general form:
max. −~1t
~t
A~x −~t ≤ ~b
−A~x −~t ≤ −~bt1, . . . , tm ≥ 0
Lectures 10,11 Slide# 17
L∞ norm minimization
Decision Variables: ~x = (x1, . . . , xn) ∈ Rn.Objective function: min.||A~x − ~b||∞.Constraints: No explicit constraint.We will reduce this to a linear program.
Lectures 10,11 Slide# 18
Example
Example: min.||(2x + 3y − 1, y)||∞ = min max.(|2x + 3y − 1|, |y |).Trick: Let t ≥ 0 be s.t.
|2x + 3y − 1| ≤ t
|y | ≤ t
Consider the following problem:
min. t
|2x + 3y − 1| ≤ t
|y | ≤ t
t ≥ 0
Goal: convince class that this problem is equivalent to original.Note: |y | ≤ t can be rewritten as −t ≤ y ≤ t.
Lectures 10,11 Slide# 19
Example:LP formulation
min. t
2x + 3y − 1 ≤ t
−2x − 3y + 1 ≤ t
y ≤ t
−y ≤ t
t ≥ 0
Solution: x = 12 , y = 0.
Lectures 10,11 Slide# 20
L∞ norm minimizationProblem: min.||Ax − b||∞.Solution: Let A be m × n matrix. Add variables t1, . . . , tm correspodingto rows of m.
min. t
|Ai~x − ~b| ≤ t
t ≥ 0
⇒
min. t
A~x − ~b ≤ t.~1
−A~x + ~b ≤ t.~1t ≥ 0
We write it in the general form:
max. t
A~x − t.~1 ≤ ~b
−A~x − t.~1 ≤ −~bt ≥ 0
Lectures 10,11 Slide# 21
Application-1: Estimation of Solutions
Lectures 10,11 Slide# 22
Application-1: Solution Estimation
Problem: We are trying to solve the linear equation
A~x = ~b
1. The entries in A,~b could have random errors (noisy measurements).
2. There are many more equations than variables.
A Noise ~b~x A~x
Lectures 10,11 Slide# 23
Application: Estimation
Example Problem: We are trying to measure a quantity x using noisymeasurements:
2x = 5.13x = 7.64x = 9.915x = 12.456x = 15.1
Q-1: Is there a value of x that explains the measurement?Q-2: What do we do in this case?
Lectures 10,11 Slide# 24
Application: Estimation
Example Problem: We are trying to measure a quantity x using noisymeasurements:
2x = 5.13x = 7.64x = 9.915x = 12.456x = 15.1
Find x that nearly fits all the measurements., i,e,
min.||(2x − 5.1, 3x − 7.6, 4x − 9.91, 5x − 12.45, 6x − 15.1)||p .
We can choose p = 1, 2,∞ and see what answer we get.
Lectures 10,11 Slide# 25
Application:Estimation
L2 norm: Apply least squares for
A =
23456
~b =
−5.1−7.6−9.91−12.45−15.1
Solving for x = argmin||Ax − ~b||2, yields x ∼ 2.505.Residual error:
(0.09, 0.08,−.11,−0.08,−.06) .
Lectures 10,11 Slide# 26
Application:Estimation
L1 norm: Reduce to linear program using
A =
23456
, ~b =
−5.1−7.6−9.91−12.45−15.1
, .
Solution: x = 2.49.Residual Error:
(−.12,−.13,−0.05, 0.0, .16) .
L∞ norm: Reduce to linear program.Solution: x = 2.501Residual Error:
(−.1,−.1, .1, .05,−.1)
Lectures 10,11 Slide# 27
Experiment with Larger Matrices (matlab)
◮ L1 norm: large number of entries with small residuals, spread is larger.
◮ L2 norm: residuals look like a gaussian distribution.
◮ L∞ norm: residuals look more uniform, smaller spread.
Matlab code is available, run your own experiments!
Lectures 10,11 Slide# 28
Regularized Approximation
Ordinary least squares can have poor performance (ill conditioningproblem).Tikhonov Regularized Approximation:
||A~x − b||2 + λ ||~x ||2 .
λ ≥ 0 is a tuning parameter.Note-1: This is the same as the following norm minimization:
∣∣∣∣∣
∣∣∣∣∣
[
A
λ12 I
]
~x −[~b
0
]∣∣∣∣∣
∣∣∣∣∣
2
2
.
Note-2: In general, we can also have regularized approximation usingL1, L∞ and other norms.
Lectures 10,11 Slide# 29
Penalty Function Approximation
Problem: Solveminimize.φ(A~x − ~b) .
where φ is a penalty function.
If φ = L1, L2, L∞, this is exactly the same as norm minimization.
Note-1: In general, φ need not be a norm.Note-2: φ is sometimes called a loss function.
Lectures 10,11 Slide# 30
Penalty Functions: ExampleDeadzone Linear Penalty: Let ~x = (x1, . . . , xn)
t
. The deadzone linearpenalty is sum of penalties on each individual component x1, . . . , xn:
φdz(~x) : φdz(x1) + φdz(x2) + · · ·+ φdz(xn) ,
wherein
φdz(x) =
{0 |u| ≤ a
b(|u| − a) |u| > a
-a a
L1
φdzL2
Lectures 10,11 Slide# 31
Deadzone Linear Penalty (cont)
Deadzone vs. L2:
◮ L2 norm imposes large penalty when residual is high.Therefore, outliers in data can affect performance drastically.
◮ Deadzone penalty function is generally less sensitive to outliers.
Q: How do we solve the deadzone penalty approximation problem?A: Apply tricks for L1, L∞ (upcoming assignment).
Lectures 10,11 Slide# 32
Other penalty functionsHuber Penalty: The Huber penalty is sum of functions on individualcomponents x1, . . . , xn:
φH(x1) + φH(x2) + · · ·+ φH(xn) ,
wherein
φH(x) =
{u2 |u| ≤ a
a(2|u| − a) |u| > a
-a a
φH L2
Lectures 10,11 Slide# 33
Penalty Function Approximation: Summary
General observations:
◮ L2: Natural, easy to solve (using pseudo inverse). However, sensitiveto outliers.
◮ L1: Sparse: ensures that most residuals are very small.Insensitive to outliers.However, spread of error is larger.
◮ L∞ : Non-sparse but minimizes spread. Very sensitive to outliers.
◮ Other loss functions: provide various tradeoffs.
Details: Cf. Boyd’s book.
Lectures 10,11 Slide# 34
Application-2: Signal Denoising
Lectures 10,11 Slide# 35
Application-2: Signal DenoisingInput: Given samples of a noisy signal:
~y : (~y1, ~y2, . . . , ~yN)t
.
Original signal is unknown (very little data available).Problem: Find new signal ~y∗ by minimizing
mininize. ||~y∗ − ~y ||pp + φs(~y∗) .
φs is a smoothing function.
0 5 10−3
−2
−1
0
1
2
3
4Original Signal
0 5 10−3
−2
−1
0
1
2
3
4Signal With Noise
Lectures 10,11 Slide# 36
Smoothing Criteria
◮ Quadratic Smoothing: L2 norm on successive differences.
φq(~y) =N∑
i=2
(yi − yi−1)2 .
◮ Total Variation Smoothing: L1 norm on successive differences.
φtv (~y) =N∑
i=2
|yi − yi−1| .
◮ Maximum Variation Smoothing: L∞ norm on successive differences.
φmax(~y) =N
maxi=2
(|yi+1 − yi |) .
Lectures 10,11 Slide# 37
Quadratic Smoothing
Problem: minimize ||~y − ~yn||22 + φq(~y).This can be viewed as a least squares problem:
minimize
∣∣∣∣
∣∣∣∣
[I
∆
]
~y∗ −[~yn0
]∣∣∣∣
∣∣∣∣
2
2
.
Where ∆ is the matrix:
∆ =
−1 1 0 · · · 0 00 −1 1 · · · 0 0
0 0 0. . . 0 0
0 0 0 · · · −1 1
Lectures 10,11 Slide# 38
Total/Maximum Variance Smoothing
Total variance smoothing: L1 least squares problem.
minimize
∣∣∣∣
∣∣∣∣
[I
∆
]
~y∗ −[~yn0
]∣∣∣∣
∣∣∣∣1
.
Max. variance smoothing: L1 least squares problem.
minimize
∣∣∣∣
∣∣∣∣
[I
∆
]
~y∗ −[~yn0
]∣∣∣∣
∣∣∣∣∞
.
Where ∆ is the matrix:
∆ =
−1 1 0 · · · 0 00 −1 1 · · · 0 0
0 0 0. . . 0 0
0 0 0 · · · −1 1
Lectures 10,11 Slide# 39
Experiment
0 5 10−5
0
5Original Signal
0 5 10−5
0
5Signal With Noise
0 5 10−5
0
5L1 reconstructed
−6 −4 −2 0 20
200
400L1 residuals
0 5 10−5
0
5L2 reconstructed
−4 −2 0 20
200
400L2 residuals
0 5 10−5
0
5L∞ reconstructed
−2 −1 0 1 20
100
200L∞ residuals
Lectures 10,11 Slide# 40
Other Smoothing Functions
Consider three successive data points instead of two:
φT :m−1∑
i=2
|2yi − (yi−1 + yi+1)| .
Modify matrix ∆ as follows:
∆ =
−1 2 −1 0 · · · 0 0 00 −1 2 −1 · · · 0 0 0
0 0 0 0. . . 0 0 0
0 0 0 0 · · · −1 2 −1
∆ is called a Topelitz Matrix.
Lectures 10,11 Slide# 41
Application-3: Regression
Lectures 10,11 Slide# 42
Linear RegressionInputs: Data points ~xi : (xi1, . . . , xin) and outcome yi .
Goal: Find a function
f (~x) = c1x1 + c2x2 + . . .+ cnxn + c0
that best fits the data.
0 1 2 3 4 5 6 7 8 9 10−2
0
2
4
6
8
10
12input vs. output plot for given data.
Lectures 10,11 Slide# 43
Linear RegressionInputs: Data points ~xi : (xi1, . . . , xin) and outcome yi .
Goal: Find a function
f (~x) = c1x1 + c2x2 + . . .+ cnxn + c0
that best fits the data.
0 1 2 3 4 5 6 7 8 9 10−2
0
2
4
6
8
10
12input vs. output plot for given data.
Lectures 10,11 Slide# 44
Linear Regression
Let X be the data matrix and ~y be corresponding outcome matrix.
Note: The i th row Xi represents data point iyi is the corresponding outcome for Xi
If ~ct
~x + c0 is the best line fit, the error for i th data point is:
ǫi : (Xi · ~c) + c0 − yi .
We can write the overall error as:∣∣∣
∣∣∣[X ~1]~c − ~y
∣∣∣
∣∣∣p.
Lectures 10,11 Slide# 45
Regression
Inputs: Data X , outcome ~y .Decision Vars: c0, c1, . . . , cn the coefficients of the straight line.
Solution:minimize∣∣∣
∣∣∣[X ~1]~c − ~y
∣∣∣
∣∣∣p.
Note: Instead of the norm, we could use any penalty function.
Ordinary Regression: min∣∣∣
∣∣∣[X ~1]~c − ~y
∣∣∣
∣∣∣
p
p.
Tikhanov Regression: min∣∣∣
∣∣∣[X ~1]~c − ~y
∣∣∣
∣∣∣
2
2+ λ · ||~c ||22.
Lectures 10,11 Slide# 46
Experiment
Experiment: Create noisy data set with outliers.
80 regular data points with low noise.5 outlier datapoints with large noise.
Goal: Compare effect of L1, L2, L∞ norm regressions.
Lectures 10,11 Slide# 47
Regression with Outliers
0 1 2 3 4 5 6 7 8 9 10−15
−10
−5
0
5
10
15Regression with outliers
L
2
L1
L∞
Lectures 10,11 Slide# 48
Regression with Outliers (Error Histogram)
−5 0 5 10 15 200
50
100L1 norm residual
−5 0 5 10 15 200
20
40
60L2 norm residual
−10 −8 −6 −4 −2 0 2 4 6 8 100
20
40L∞ norm residual
Lectures 10,11 Slide# 49
Linear Regression With Non-Linear Models
Input: Data X , outcome ~y , functions g1, . . . , gk .Goal: Fit a model of the form
yi = c0 + c1g1(~x) + c2g2(~x) + . . .+ ckgk(~x) .
Solution: Transform the data matrix as follows:
X ′ =
g1(X1) . . . gk(X1)g1(X2) . . . gk(X2)
. . .
g1(Xn) . . . gk(Xn)
.
Apply linear regression on (X ′, ~y).
Lectures 10,11 Slide# 50
ExperimentExperiment: Create noisy non-linear data set with outliers.
~y = 0.2sin(x) + 1.5 ∗ cos(x) + 2 ∗ cos(1.5 ∗ x) + .2 ∗ x − 3 + Noise .
80 regular data points with low noise.5 outlier datapoints with large noise.
0 1 2 3 4 5 6 7 8 9 10−6
−4
−2
0
2
4
6
8
10
12
14Regression with non−linear model
Dataoutliers
Goal: Compare effect of L1, L2, L∞ norm regressions.Lectures 10,11 Slide# 51
Experiment: linear regression non-linear model
0 1 2 3 4 5 6 7 8 9 10−6
−4
−2
0
2
4
6
8
10
12
14Regression with outliers
L
1L
2
L∞
Dataoutliers
Lectures 10,11 Slide# 52
Experiment: residuals
−20 −15 −10 −5 0 5 100
100
200L1 norm residual
−20 −15 −10 −5 0 5 100
100
200L2 norm residual
−8 −6 −4 −2 0 2 4 6 80
20
40
60L∞ norm residual
Lectures 10,11 Slide# 53
Classification
Problem: Given two classes: ~x+1 , . . . , ~x+N and ~x−1 , . . . , ~x−M of data points.Goal: Find separating hyperplane between the classes.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2
−1.5
−1
−0.5
0
0.5
1
1.5
→ x+
→ x−
classifier
Lectures 10,11 Slide# 54
Application-4: Classification
Lectures 10,11 Slide# 55
Linear Classification
Problem: Given two classes: ~x+1 , . . . , ~x+N and ~x−1 , . . . , ~x−M of data points.Goal: Find separating hyperplane between the classes.
Q-1: Will such an hyperplane always exist?
A: Not always! Only if data is linearly separable.
1. Assume that such a hyperplane exists.
2. Deal with cases where a hyperplane does not exist.
Lectures 10,11 Slide# 56
Linear Classification (cont)
Solution: Use linear programming.Decision Variables: w0,w1,w2, . . . ,wn, representing the classfier:
fw (~x) : w0 + w1x1 + · · ·+ wnxn .
Linear Programming:max. ???
X+~w ≥ 0X−~w ≤ 0
X+ :
~x+1 1~x+2 1~x+3 1... 1
~x+N 1
X− :
~x−1 1~x−2 1... 1
~x−M 1
Lectures 10,11 Slide# 57
Classification Using Linear Programs
Objective: Lets try∑
i wi = 1t
~w .
Note: We write⟨
~a,~b⟩
for dot product ~at~b.
Simplified Data representation: Represent two classes +1,−1 for data.
Data:(~x1, y1), . . . , (~xN , yN) ⇒ (X , ~y) ,
where yi = +1 for class I and yi = −1 for class II data.
Linear Program:
min.⟨
~1, ~w⟩
+ w0
yi (〈~xi , ~w〉+ w0) ≤ 0
Verify that this formulation works.
Lectures 10,11 Slide# 58
Experiment: ClassificationExperiment: Find a classifier for the following points.
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5Data for classification
Lectures 10,11 Slide# 59
Experiment: Classification
Experiment: Outcome.
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5Classifiers
Note: Many different classifiers are possible.Note-2: What is the deal with points that lie on the classifier line?
Lectures 10,11 Slide# 60
Dual, Complementary Slackness
Linear Program:
min.⟨
~1, ~w⟩
+ w0
yi (〈~xi , ~w〉+ w0) ≤ 0
Complementary Slackness: Let α1, . . . , αN be the dual variables.
αiyi (〈~xi , ~w〉+ w0) = 0, αi ≥ 0 .
Claim: There are at least n + 1 points ~xj1 , . . . , ~xjn+1 s.t.
〈~xji , ~w〉+ w0 = 0 .
Note: n + 1 pts. fully determine the hyper-plane ~w ;w0.
Lectures 10,11 Slide# 61
Classifying Points
Problem: Given a new point ~x , let us find which class it belongs to.Solution-1: Just find out the sign of 〈~w , ~x〉+ w0, we know ~w ;w0 fromour LP solution.
Solution-2: More complicated. Express ~x as a linear combination ofsupport vectors:
~x = λ1~xj1 + λ2~xj2 + · · ·+ λn+1~xn+1
We can write 〈~w , ~x〉 as
〈~w , ~x〉+w0 =∑
i
λi
w0 + 〈~w , ~xji 〉︸ ︷︷ ︸
0
+
(
1−∑
i
λi
)
w0 =
(
1−∑
i
λi
)
w0 .
This is impractical for linear programming formulations.
Lectures 10,11 Slide# 62
Non-Separable Data
Q: What if data is not linear separable?The linear program will be primal infeasible
In practice, linear separation may hold without noise but not with noise.
Solution: Relax the conditions on half-space w .
ξ1
ξ2
Lectures 10,11 Slide# 63
Non-Separable Data: Linear Programming
Classifier: (old) w0 + 〈~w , ~x〉.(new) 〈~w , ~x〉+ 1.
Note: Classifier∑
i wixi + w0 is equivalent to∑
iwi
w0xi + 1 if w0 6= 0.
LP Formulation:
minimize∣∣∣
∣∣∣~ξ∣∣∣
∣∣∣1+ · · ·
⟨~x+i , ~w
⟩+ 1 ≤ ξ+i⟨
~x−j , ~w⟩
+ 1 ≥ −ξ−j
ξ+i , ξ−j ≥ 0
The variable ξ encodes tolerance. We minimize ξ as part of objective.
Exercise: Verify that the support vector interpretation of dual works.
Lectures 10,11 Slide# 64
Non Linear Model Classification
Linear Classifier: 〈~w , ~x〉+ 1.
Non-linear functions: ~φ : Rn 7→ R.
〈~w , (φ1(~x), φ2(~x), . . . , φm(~x))〉 .
This is exactly same idea as linear regression with non-linear functions.
Lectures 10,11 Slide# 65