lecture 03

Post on 07-Dec-2015

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

this is the lecture 3 in machine learning series of lectures in iiiiiiiit

TRANSCRIPT

Statistical Methods in Artificial IntelligenceCSE471 - Monsoon 2015 : Lecture 03

Avinash SharmaCVIT, IIIT Hyderabad

Lecture 03: Plan

• Linear Algebra Recap• Linear Discriminant Functions (LDFs)• The Perceptron• Generalized LDFs• The Two-Category Linearly Separable Case• Learning LDF: Basic Gradient Descend• Perceptron Criterion Function

Basic Linear Algebra Operations

• Vector• Vector Operations– Scaling– Transpose– Addition– Subtraction– Dot Product

• Equation of a Plane

¿ [𝑥1  𝑦1  𝑧 1  ]

Vector Operations

𝑥

𝑦

¿ [𝑥1  𝑦1  ]

𝑥1

𝑦 1𝒗

𝑧1

𝑧

𝑎𝒗=[𝑎𝑥1  𝑎𝑦1  𝑎𝑧 1   ]

‖𝒗‖=√𝑥12+𝑦12+𝑧12‖𝑎𝒗‖=𝑎√𝑥12+𝑦12+𝑧12

Scaling: Only Magnitude Changes

¿ [𝑎𝑥1 𝑎𝑦1 𝑎𝑧1 ]𝑇Transpose

Vector Operations

𝑥

𝑦

𝒖

𝒗

𝒖+𝒗

𝒖−𝒗

−𝒗

Vector Operations

• Dot Product (Inner Product) of two vectors is a scalar.

𝒖 . 𝒗=𝑥1 𝑦1+𝑥2 𝑦2

𝑥

𝑦

𝒖

𝒗

¿𝒖𝑻 𝒗¿‖𝒖‖‖𝒗‖cos𝜶

𝜶

𝒖𝑻 𝒗‖𝒖‖‖𝒗‖=cos𝜶

• Dot product if two perpendicular vectors is 0

Equation of a Plane

𝑶

𝑃0

𝑃

𝒓𝒓𝟎

𝒓 −𝒓𝟎

𝒏

90 °

(𝒓 −𝒓𝟎 ) .𝒏=𝟎

Linear Discriminant Functions

• Assumes a 2-class classification setup

• Decision boundary is represented explicitly in terms of components of .

• Aim is to seek parameters of a linear discriminant function which minimize the training error.

• Why Linear ? – Simplest possible– Generalized

Linear Discriminant Functions

𝑔 ( 𝑿 )={ ¿0 (+𝑣𝑒)𝑐𝑙𝑎𝑠𝑠 𝐴¿0 (−𝑣𝑒)𝑐𝑙𝑎𝑠𝑠𝐵

¿ 0𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦

𝑔 ( 𝑿 )=0𝑿

Class A

Class B

𝑾

𝑔 ( 𝑿 )=𝑾 𝑇 𝑿

𝑔 ( 𝑿 )=𝑾 𝑇 𝑿+𝑤0

The perceptron𝑔 ( 𝑿 )=𝑾 𝑇 𝑿+𝑤0

𝑥0=1

𝑥1

𝑥2

𝑥𝑑

⋮ ⋮

𝑤0

𝑤1

𝑤2

𝑤𝑑

∑❑ 𝑔 ( 𝑿 )

𝑿=[𝑥1  ⋮  𝑥𝑑  ]

𝑾=[𝑤1  ⋮  𝑤𝑑  

]

Perceptron Decision Boundary

• Decision boundary surface (hyperplane) divides feature space into two regions.

• Orientation of the boundary surface is decided by the normal vector .

• Location of the boundary surface is determined by the bias term .

• is proportional to distance of from the boundary surface.

• positive side and negative side.

Perceptron Summary

Generalized LDFs• Linear:

• Non Linear

𝑔 ( 𝑿 )=𝑤0+𝑾𝑇 𝑿 𝑿=[𝑥1  ⋮  𝑥𝑑  

]𝑾=[𝑤1  ⋮  𝑤𝑑  

]𝑔 ( 𝑿 )=𝑤0+∑

𝑖=1

𝑑

𝑤 𝑖𝑥 𝑖

𝑔 ( 𝑿 )=𝑤0+∑𝑖=1

𝑑

𝑤 𝑖𝑥 𝑖+∑𝑖=1

𝑑

∑𝑗=1

𝑑

𝑤 𝑖 𝑗 𝑥𝑖 𝑥 𝑗(Quadratic)

Generalized LDFs

• Linear

• Non Linear

𝑿=[𝑥1  ⋮  𝑥𝑑  ]𝑾=[𝑤1  

⋮  𝑤𝑑  

]

𝒀=[𝑥0  ⋮  𝑥𝑑  ]=[𝑥0𝑿 ]𝒂=[𝑤0  

⋮  𝑤𝑑  

]=[𝑤0

𝑾 ]

=1

𝑔 ( 𝑿 )=𝑤0+𝑾𝑇 𝑿

𝑔 ( 𝑿 )=𝒂𝑇𝒀

𝑔 ( 𝑿 )=∑𝑖=0

𝑑

𝑤𝑖 𝑥𝑖

𝒀=φ(𝑿 )

𝑔 ( 𝑿 )=𝒂𝑇𝒀=∑𝑖=1

�̂�

𝑎𝑖 𝑦 𝑖𝒂=[𝑎1  ⋮  𝑎

�̂� ]

Generalized LDFs

Generalized LDFs Summary

• can be any arbitrary mapping function that projects original data points to points where .

• The hyperplane decision surface passes through origin.

• Advantage: In the mapped higher dimensional space data might be linear separable.

• Disadvantage: The mapping is computationally intensive and learning the classification parameters can be non-trivial.

Two-Category Linearly Separable Case

𝑔 ( 𝑿 )=𝒂𝑇𝒀=∑𝑖=1

�̂�

𝑎𝑖 𝑦 𝑖¿ { ¿ 0(+𝑣𝑒)𝑐𝑙𝑎𝑠𝑠 𝐴¿0 (−𝑣𝑒)𝑐𝑙𝑎𝑠𝑠𝐵

¿0𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦

𝑦 1

𝑦 2𝒂

Two-Category Linearly Separable Case

𝑔 ( 𝑿 )=𝒂𝑇𝒀=∑𝑖=1

�̂�

𝑎𝑖 𝑦 𝑖>0

𝑦 1

𝑦 2𝒂

Normalized Case

Two-Category Linearly Separable Case

Data vector

Two-Category Linearly Separable Case

Learning LDF: Basic Gradient Descend

• Define a scalar function which captures classification error for specific boundary plane described by parameter

• Minimize using gradient descent.– Start with arbitrary value of for .– Iteratively refine estimate of :

• is the positive scale factor also known as learning rate– A too small makes the convergence very slow– A too large can diverge due to overshooting of corrections.

top related