lecture 03

Statistical Methods in Artificial IntelligenceCSE471 - Monsoon 2015 : Lecture 03

Avinash SharmaCVIT, IIIT Hyderabad

Lecture 03: Plan

• Linear Algebra Recap• Linear Discriminant Functions (LDFs)• The Perceptron• Generalized LDFs• The Two-Category Linearly Separable Case• Learning LDF: Basic Gradient Descend• Perceptron Criterion Function

Basic Linear Algebra Operations

• Vector• Vector Operations– Scaling– Transpose– Addition– Subtraction– Dot Product

• Equation of a Plane

¿ [𝑥1 𝑦1 𝑧 1 ]

Vector Operations

¿ [𝑥1 𝑦1 ]

𝑦 1𝒗

𝑎𝒗=[𝑎𝑥1 𝑎𝑦1 𝑎𝑧 1 ]

‖𝒗‖=√𝑥12+𝑦12+𝑧12‖𝑎𝒗‖=𝑎√𝑥12+𝑦12+𝑧12

Scaling: Only Magnitude Changes

¿ [𝑎𝑥1 𝑎𝑦1 𝑎𝑧1 ]𝑇Transpose

Vector Operations

𝒖+𝒗

𝒖−𝒗

−𝒗

Vector Operations

• Dot Product (Inner Product) of two vectors is a scalar.

𝒖 . 𝒗=𝑥1 𝑦1+𝑥2 𝑦2

¿𝒖𝑻 𝒗¿‖𝒖‖‖𝒗‖cos𝜶

𝒖𝑻 𝒗‖𝒖‖‖𝒗‖=cos𝜶

• Dot product if two perpendicular vectors is 0

Equation of a Plane

𝒓𝒓𝟎

𝒓 −𝒓𝟎

(𝒓 −𝒓𝟎 ) .𝒏=𝟎

Linear Discriminant Functions

• Assumes a 2-class classification setup

• Decision boundary is represented explicitly in terms of components of .

• Aim is to seek parameters of a linear discriminant function which minimize the training error.

• Why Linear ? – Simplest possible– Generalized

Linear Discriminant Functions

𝑔 ( 𝑿 )={ ¿0 (+𝑣𝑒)𝑐𝑙𝑎𝑠𝑠 𝐴¿0 (−𝑣𝑒)𝑐𝑙𝑎𝑠𝑠𝐵

¿ 0𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦

𝑔 ( 𝑿 )=0𝑿

Class A

Class B

𝑔 ( 𝑿 )=𝑾 𝑇 𝑿

𝑔 ( 𝑿 )=𝑾 𝑇 𝑿+𝑤0

The perceptron𝑔 ( 𝑿 )=𝑾 𝑇 𝑿+𝑤0

𝑥0=1

𝑥𝑑

⋮ ⋮

𝑤𝑑

∑❑ 𝑔 ( 𝑿 )

𝑿=[𝑥1 ⋮ 𝑥𝑑 ]

𝑾=[𝑤1 ⋮ 𝑤𝑑

Perceptron Decision Boundary

• Decision boundary surface (hyperplane) divides feature space into two regions.

• Orientation of the boundary surface is decided by the normal vector .

• Location of the boundary surface is determined by the bias term .

• is proportional to distance of from the boundary surface.

• positive side and negative side.

Perceptron Summary

Generalized LDFs• Linear:

• Non Linear

𝑔 ( 𝑿 )=𝑤0+𝑾𝑇 𝑿 𝑿=[𝑥1 ⋮ 𝑥𝑑

]𝑾=[𝑤1 ⋮ 𝑤𝑑

]𝑔 ( 𝑿 )=𝑤0+∑

𝑖=1

𝑤 𝑖𝑥 𝑖

𝑔 ( 𝑿 )=𝑤0+∑𝑖=1

𝑤 𝑖𝑥 𝑖+∑𝑖=1

∑𝑗=1

𝑤 𝑖 𝑗 𝑥𝑖 𝑥 𝑗(Quadratic)

Generalized LDFs

• Linear

• Non Linear

𝑿=[𝑥1 ⋮ 𝑥𝑑 ]𝑾=[𝑤1

⋮ 𝑤𝑑

𝒀=[𝑥0 ⋮ 𝑥𝑑 ]=[𝑥0𝑿 ]𝒂=[𝑤0

⋮ 𝑤𝑑

]=[𝑤0

𝑾 ]

𝑔 ( 𝑿 )=𝑤0+𝑾𝑇 𝑿

𝑔 ( 𝑿 )=𝒂𝑇𝒀

𝑔 ( 𝑿 )=∑𝑖=0

𝑤𝑖 𝑥𝑖

𝒀=φ(𝑿 )

𝑔 ( 𝑿 )=𝒂𝑇𝒀=∑𝑖=1

�̂�

𝑎𝑖 𝑦 𝑖𝒂=[𝑎1 ⋮ 𝑎

�̂� ]

Generalized LDFs

Generalized LDFs Summary

• can be any arbitrary mapping function that projects original data points to points where .

• The hyperplane decision surface passes through origin.

• Advantage: In the mapped higher dimensional space data might be linear separable.

• Disadvantage: The mapping is computationally intensive and learning the classification parameters can be non-trivial.

Two-Category Linearly Separable Case

𝑔 ( 𝑿 )=𝒂𝑇𝒀=∑𝑖=1

�̂�

𝑎𝑖 𝑦 𝑖¿ { ¿ 0(+𝑣𝑒)𝑐𝑙𝑎𝑠𝑠 𝐴¿0 (−𝑣𝑒)𝑐𝑙𝑎𝑠𝑠𝐵

¿0𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦

𝑦 1

𝑦 2𝒂

𝑔 ( 𝑿 )=𝒂𝑇𝒀=∑𝑖=1

�̂�

𝑎𝑖 𝑦 𝑖>0

𝑦 1

𝑦 2𝒂

Normalized Case

Data vector

Learning LDF: Basic Gradient Descend

• Define a scalar function which captures classification error for specific boundary plane described by parameter

• Minimize using gradient descent.– Start with arbitrary value of for .– Iteratively refine estimate of :

• is the positive scale factor also known as learning rate– A too small makes the convergence very slow– A too large can diverge due to overshooting of corrections.

lecture 03

separable case17twocategory

separable caselearning

separable case learning

perpendicular vectors

plane vector operationsscaling

perceptrongeneralized

iiit hyderabadlecture

avinash sharmacvit

Documents

lecture 03

chpt 03 lecture

lecture - 03

week 03 lecture

03 lecture

lecture 03 ocls

philosophy lecture 03

lecture 03.pdf

softwareengg2012 lecture 03

lecture 03 handout

03 lecture ppt

lecture+03 (1)

dsa - lecture 03

tro3 lecture 03

lecture - 03 interpretation

python lecture 03

lecture -03

chapter 03 lecture

lecture 03 wimax

microprocessors - lecture 03 -...