madhavan mukundmadhavan mukund lecture 16: 3 june, 2021 dmml apr–jul 2021 2/14 the curse of...

49
Lecture 16: 3 June, 2021 Madhavan Mukund https://www.cmi.ac.in/ ~ madhavan Data Mining and Machine Learning April–July 2021

Upload: others

Post on 21-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Lecture 16: 3 June, 2021

Madhavan Mukund

https://www.cmi.ac.in/~madhavan

Data Mining and Machine LearningApril–July 2021

Page 2: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

The curse of dimensionality

ML data is often high dimensional — especially images

A 1000⇥ 1000 pixel image has 106 features

Data behaves very di↵erently in high dimensions

2D unit square, 0.04% probability of being near the border (within 0.001)

104D hypercube, 99.999999% probability of being near the border

Distances between items

2D unit square, mean distance between 2 random points is 0.52

3D unit cube, mean distance between 2 random points is 0.66

106D unit hypercube, mean distance between 2 random points is approximately 408.25

There’s a lot of “space” in higher dimensions!

Higher danger of overfitting

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14

Page 3: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

The curse of dimensionality

ML data is often high dimensional — especially images

A 1000⇥ 1000 pixel image has 106 features

Data behaves very di↵erently in high dimensions

2D unit square, 0.04% probability of being near the border (within 0.001)

104D hypercube, 99.999999% probability of being near the border

Distances between items

2D unit square, mean distance between 2 random points is 0.52

3D unit cube, mean distance between 2 random points is 0.66

106D unit hypercube, mean distance between 2 random points is approximately 408.25

There’s a lot of “space” in higher dimensions!

Higher danger of overfitting

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14

"

I%.

Page 4: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

The curse of dimensionality

ML data is often high dimensional — especially images

A 1000⇥ 1000 pixel image has 106 features

Data behaves very di↵erently in high dimensions

2D unit square, 0.04% probability of being near the border (within 0.001)

104D hypercube, 99.999999% probability of being near the border

Distances between items

2D unit square, mean distance between 2 random points is 0.52

3D unit cube, mean distance between 2 random points is 0.66

106D unit hypercube, mean distance between 2 random points is approximately 408.25

There’s a lot of “space” in higher dimensions!

Higher danger of overfitting

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14

Page 5: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

The curse of dimensionality

ML data is often high dimensional — especially images

A 1000⇥ 1000 pixel image has 106 features

Data behaves very di↵erently in high dimensions

2D unit square, 0.04% probability of being near the border (within 0.001)

104D hypercube, 99.999999% probability of being near the border

Distances between items

2D unit square, mean distance between 2 random points is 0.52

3D unit cube, mean distance between 2 random points is 0.66

106D unit hypercube, mean distance between 2 random points is approximately 408.25

There’s a lot of “space” in higher dimensions!

Higher danger of overfitting

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14

Page 6: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

The curse of dimensionality

ML data is often high dimensional — especially images

A 1000⇥ 1000 pixel image has 106 features

Data behaves very di↵erently in high dimensions

2D unit square, 0.04% probability of being near the border (within 0.001)

104D hypercube, 99.999999% probability of being near the border

Distances between items

2D unit square, mean distance between 2 random points is 0.52

3D unit cube, mean distance between 2 random points is 0.66

106D unit hypercube, mean distance between 2 random points is approximately 408.25

There’s a lot of “space” in higher dimensions!

Higher danger of overfitting

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14

Page 7: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

The curse of dimensionality

ML data is often high dimensional — especially images

A 1000⇥ 1000 pixel image has 106 features

Data behaves very di↵erently in high dimensions

2D unit square, 0.04% probability of being near the border (within 0.001)

104D hypercube, 99.999999% probability of being near the border

Distances between items

2D unit square, mean distance between 2 random points is 0.52

3D unit cube, mean distance between 2 random points is 0.66

106D unit hypercube, mean distance between 2 random points is approximately 408.25

There’s a lot of “space” in higher dimensions!

Higher danger of overfitting

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14

Page 8: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

The curse of dimensionality

ML data is often high dimensional — especially images

A 1000⇥ 1000 pixel image has 106 features

Data behaves very di↵erently in high dimensions

2D unit square, 0.04% probability of being near the border (within 0.001)

104D hypercube, 99.999999% probability of being near the border

Distances between items

2D unit square, mean distance between 2 random points is 0.52

3D unit cube, mean distance between 2 random points is 0.66

106D unit hypercube, mean distance between 2 random points is approximately 408.25

There’s a lot of “space” in higher dimensions!

Higher danger of overfitting

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14

Page 9: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

The curse of dimensionality

ML data is often high dimensional — especially images

A 1000⇥ 1000 pixel image has 106 features

Data behaves very di↵erently in high dimensions

2D unit square, 0.04% probability of being near the border (within 0.001)

104D hypercube, 99.999999% probability of being near the border

Distances between items

2D unit square, mean distance between 2 random points is 0.52

3D unit cube, mean distance between 2 random points is 0.66

106D unit hypercube, mean distance between 2 random points is approximately 408.25

There’s a lot of “space” in higher dimensions!

Higher danger of overfitting

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14

Page 10: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Dimensionality reduction

Remove unimportant features byprojecting to a smaller dimension

Example: project blue points in3D to black points in 2D plane

Principal Component Anaylsis —transform d-dimensional input tok-dimensional input, preservingessential features

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 3 / 14

Page 11: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Dimensionality reduction

Remove unimportant features byprojecting to a smaller dimension

Example: project blue points in3D to black points in 2D plane

Principal Component Anaylsis —transform d-dimensional input tok-dimensional input, preservingessential features

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 3 / 14

Page 12: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Dimensionality reduction

Remove unimportant features byprojecting to a smaller dimension

Example: project blue points in3D to black points in 2D plane

Principal Component Anaylsis —transform d-dimensional input tok-dimensional input, preservingessential features

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 3 / 14

Page 13: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular Value Decomposition (SVD)

Input matrix M, dimensions n ⇥ d

Rows are items, columns are features

Decompose M as UDV>

D is a k ⇥ k diagonal matrix, positive real entries

U is n ⇥ k , V is d ⇥ k

Columns of U, V are orthonormal — unit vectors, mutually orthogonal

Interpretation

Columns of V correspond to new abstract features

Rows of U describe decomposition of terms across features

M =P

i Dii (ui · v>i )

For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M

ui · v>i describes components of rows of M along direction vi

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14

Features

"ia

V,V2

- -

w'dItems '

i

v? v1 - -v2

Page 14: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular Value Decomposition (SVD)

Input matrix M, dimensions n ⇥ d

Rows are items, columns are features

Decompose M as UDV>

D is a k ⇥ k diagonal matrix, positive real entries

U is n ⇥ k , V is d ⇥ k

Columns of U, V are orthonormal — unit vectors, mutually orthogonal

Interpretation

Columns of V correspond to new abstract features

Rows of U describe decomposition of terms across features

M =P

i Dii (ui · v>i )

For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M

ui · v>i describes components of rows of M along direction vi

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14

U D v't☒ innxk

k×nkid

v

¥.

Page 15: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular Value Decomposition (SVD)

Input matrix M, dimensions n ⇥ d

Rows are items, columns are features

Decompose M as UDV>

D is a k ⇥ k diagonal matrix, positive real entries

U is n ⇥ k , V is d ⇥ k

Columns of U, V are orthonormal — unit vectors, mutually orthogonal

Interpretation

Columns of V correspond to new abstract features

Rows of U describe decomposition of terms across features

M =P

i Dii (ui · v>i )

For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M

ui · v>i describes components of rows of M along direction vi

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14

① original feature

C) new features

Page 16: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular Value Decomposition (SVD)

Input matrix M, dimensions n ⇥ d

Rows are items, columns are features

Decompose M as UDV>

D is a k ⇥ k diagonal matrix, positive real entries

U is n ⇥ k , V is d ⇥ k

Columns of U, V are orthonormal — unit vectors, mutually orthogonal

Interpretation

Columns of V correspond to new abstract features

Rows of U describe decomposition of terms across features

M =P

i Dii (ui · v>i )

For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M

ui · v>i describes components of rows of M along direction vi

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14

0

Page 17: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular Value Decomposition (SVD)

Input matrix M, dimensions n ⇥ d

Rows are items, columns are features

Decompose M as UDV>

D is a k ⇥ k diagonal matrix, positive real entries

U is n ⇥ k , V is d ⇥ k

Columns of U, V are orthonormal — unit vectors, mutually orthogonal

Interpretation

Columns of V correspond to new abstract features

Rows of U describe decomposition of terms across features

M =P

i Dii (ui · v>i )

For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M

ui · v>i describes components of rows of M along direction vi

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14

Page 18: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular Value Decomposition (SVD)

Input matrix M, dimensions n ⇥ d

Rows are items, columns are features

Decompose M as UDV>

D is a k ⇥ k diagonal matrix, positive real entries

U is n ⇥ k , V is d ⇥ k

Columns of U, V are orthonormal — unit vectors, mutually orthogonal

Interpretation

Columns of V correspond to new abstract features

Rows of U describe decomposition of terms across features

M =P

i Dii (ui · v>i )

For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M

ui · v>i describes components of rows of M along direction vi

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14

-

0 rnxl vi.dx , v5 : lxd

Page 19: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular Value Decomposition (SVD)

Input matrix M, dimensions n ⇥ d

Rows are items, columns are features

Decompose M as UDV>

D is a k ⇥ k diagonal matrix, positive real entries

U is n ⇥ k , V is d ⇥ k

Columns of U, V are orthonormal — unit vectors, mutually orthogonal

Interpretation

Columns of V correspond to new abstract features

Rows of U describe decomposition of terms across features

M =P

i Dii (ui · v>i )

For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M

ui · v>i describes components of rows of M along direction vi

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14

Page 20: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors

Unit vectors passing through theorigin

Want to find “best” k singularvectors to represent feature space

Suppose we projectai = (ai1, ai2, . . . , aid) onto vthrough origin

Minimizing distance of ai from v isequivalent to maximizing theprojection of ai onto v

Length of the projection is ai · v

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 5 / 14

Page 21: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors

Unit vectors passing through theorigin

Want to find “best” k singularvectors to represent feature space

Suppose we projectai = (ai1, ai2, . . . , aid) onto vthrough origin

Minimizing distance of ai from v isequivalent to maximizing theprojection of ai onto v

Length of the projection is ai · v

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 5 / 14

Page 22: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors

Unit vectors passing through theorigin

Want to find “best” k singularvectors to represent feature space

Suppose we projectai = (ai1, ai2, . . . , aid) onto vthrough origin

Minimizing distance of ai from v isequivalent to maximizing theprojection of ai onto v

Length of the projection is ai · v

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 5 / 14

Page 23: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors

Unit vectors passing through theorigin

Want to find “best” k singularvectors to represent feature space

Suppose we projectai = (ai1, ai2, . . . , aid) onto vthrough origin

Minimizing distance of ai from v isequivalent to maximizing theprojection of ai onto v

Length of the projection is ai · v

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 5 / 14

antiSino -

- § → o o→ ☐

hes•

={wso→ I

Page 24: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors . . .

Sum of squares of lengths of projections of all rows in M onto v — |Mv |2

First singular vector — unit vector through origin that maximizes the sum ofprojections of all rows in M

v1 = arg max|v |=1

|Mv |

Second singular vector — unit vector through origin, perpendicular to v1, thatmaximizes the sum of projections of all rows in M

v2 = arg maxv?v1; |v |=1

|Mv |

Third singular vector — unit vector through origin, perpendicular to v1, v2, thatmaximizes the sum of projections of all rows in M

v3 = arg maxv?v1,v2; |v |=1

|Mv |

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 6 / 14

V air④

↳ei[¥ *an.v

1 MitM -✓

←pÉcanT

Page 25: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors . . .

Sum of squares of lengths of projections of all rows in M onto v — |Mv |2

First singular vector — unit vector through origin that maximizes the sum ofprojections of all rows in M

v1 = arg max|v |=1

|Mv |

Second singular vector — unit vector through origin, perpendicular to v1, thatmaximizes the sum of projections of all rows in M

v2 = arg maxv?v1; |v |=1

|Mv |

Third singular vector — unit vector through origin, perpendicular to v1, v2, thatmaximizes the sum of projections of all rows in M

v3 = arg maxv?v1,v2; |v |=1

|Mv |

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 6 / 14

Page 26: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors . . .

Sum of squares of lengths of projections of all rows in M onto v — |Mv |2

First singular vector — unit vector through origin that maximizes the sum ofprojections of all rows in M

v1 = arg max|v |=1

|Mv |

Second singular vector — unit vector through origin, perpendicular to v1, thatmaximizes the sum of projections of all rows in M

v2 = arg maxv?v1; |v |=1

|Mv |

Third singular vector — unit vector through origin, perpendicular to v1, v2, thatmaximizes the sum of projections of all rows in M

v3 = arg maxv?v1,v2; |v |=1

|Mv |

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 6 / 14

Page 27: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors . . .

Sum of squares of lengths of projections of all rows in M onto v — |Mv |2

First singular vector — unit vector through origin that maximizes the sum ofprojections of all rows in M

v1 = arg max|v |=1

|Mv |

Second singular vector — unit vector through origin, perpendicular to v1, thatmaximizes the sum of projections of all rows in M

v2 = arg maxv?v1; |v |=1

|Mv |

Third singular vector — unit vector through origin, perpendicular to v1, v2, thatmaximizes the sum of projections of all rows in M

v3 = arg maxv?v1,v2; |v |=1

|Mv |

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 6 / 14

Page 28: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors . . .

With each singular vector vj , associated singular value is �j = |Mvj |

Repeat r times till maxv?v1,v2,...,vr ; |v |=1

|Mv | = 0

r turns out to be the rank of M

Vectors {v1, v2, . . . , vr} are orthonormal right singular vectors

Our greedy strategy provably produces “best-fit” dimension r subspace for M

Dimension r subspace that maximizes content of M projected onto it

Corresponding left singular vectors are given by ui =1

�iMvi

Can show that {u1,u2, . . . ,ur} are also orthonormal

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 7 / 14

Page 29: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors . . .

With each singular vector vj , associated singular value is �j = |Mvj |

Repeat r times till maxv?v1,v2,...,vr ; |v |=1

|Mv | = 0

r turns out to be the rank of M

Vectors {v1, v2, . . . , vr} are orthonormal right singular vectors

Our greedy strategy provably produces “best-fit” dimension r subspace for M

Dimension r subspace that maximizes content of M projected onto it

Corresponding left singular vectors are given by ui =1

�iMvi

Can show that {u1,u2, . . . ,ur} are also orthonormal

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 7 / 14

Page 30: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors . . .

With each singular vector vj , associated singular value is �j = |Mvj |

Repeat r times till maxv?v1,v2,...,vr ; |v |=1

|Mv | = 0

r turns out to be the rank of M

Vectors {v1, v2, . . . , vr} are orthonormal right singular vectors

Our greedy strategy provably produces “best-fit” dimension r subspace for M

Dimension r subspace that maximizes content of M projected onto it

Corresponding left singular vectors are given by ui =1

�iMvi

Can show that {u1,u2, . . . ,ur} are also orthonormal

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 7 / 14

Page 31: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors . . .

With each singular vector vj , associated singular value is �j = |Mvj |

Repeat r times till maxv?v1,v2,...,vr ; |v |=1

|Mv | = 0

r turns out to be the rank of M

Vectors {v1, v2, . . . , vr} are orthonormal right singular vectors

Our greedy strategy provably produces “best-fit” dimension r subspace for M

Dimension r subspace that maximizes content of M projected onto it

Corresponding left singular vectors are given by ui =1

�iMvi

Can show that {u1,u2, . . . ,ur} are also orthonormal

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 7 / 14

I:-D" II.mi:D

"

I

Page 32: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

SVD : M= UDVT Old - new

} f) \ V:dxInxd nxkkxk kid

n items Dig §d-features { new

items interim features

⑨ ofnnrfeahvsT

{Diivivi → nxdnxl d%d

Page 33: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular vectors . . .

With each singular vector vj , associated singular value is �j = |Mvj |

Repeat r times till maxv?v1,v2,...,vr ; |v |=1

|Mv | = 0

r turns out to be the rank of M

Vectors {v1, v2, . . . , vr} are orthonormal right singular vectors

Our greedy strategy provably produces “best-fit” dimension r subspace for M

Dimension r subspace that maximizes content of M projected onto it

Corresponding left singular vectors are given by ui =1

�iMvi

Can show that {u1,u2, . . . ,ur} are also orthonormal

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 7 / 14

Page 34: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Singular Value Decomposition

M, dimension n ⇥ d , of rank r uniquely decomposes as M = UDV>

V = [v1 v2 · · · vr ] are the right singular vectors

D is a diagonal matrix with D[i , i ] = �i , the singular values

U = [u1 u2 · · · ur ] are the left singular vectors

M

n ⇥ d=

U

n ⇥ r

D

r ⇥ r

V>

r ⇥ d

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 8 / 14

9- /Mv , /rz -- /Mvzl

OT ?Tz

Rs Eti ✗ 9<1villi- v4 v4 , µ -- K

Page 35: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Rank-k approximation

M has rank r , SVD gives rank r decomposition

Singular values are non-increasing — �1 � �2 � · · · � �r

Suppose we retain only k largest ones

We have

Matrix of first k right singular vectors Vk = [v1 v2 · · · vk ],Corresponding singular values �1,�2, . . . ,�k

Matrix of k left singular vectors Uk = [u1 u2 · · · uk ]

Let Dk be the k ⇥ k diagonal matrix with entries �1,�2, . . . ,�k

Then UkDkV>k is the best fit rank-k approximation of M

In other words, by truncating the SVD, we can focus on k most significant featuresimplicit in M

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 9 / 14

Page 36: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Rank-k approximation

M has rank r , SVD gives rank r decomposition

Singular values are non-increasing — �1 � �2 � · · · � �r

Suppose we retain only k largest ones

We have

Matrix of first k right singular vectors Vk = [v1 v2 · · · vk ],Corresponding singular values �1,�2, . . . ,�k

Matrix of k left singular vectors Uk = [u1 u2 · · · uk ]

Let Dk be the k ⇥ k diagonal matrix with entries �1,�2, . . . ,�k

Then UkDkV>k is the best fit rank-k approximation of M

In other words, by truncating the SVD, we can focus on k most significant featuresimplicit in M

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 9 / 14

Page 37: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Rank-k approximation

M has rank r , SVD gives rank r decomposition

Singular values are non-increasing — �1 � �2 � · · · � �r

Suppose we retain only k largest ones

We have

Matrix of first k right singular vectors Vk = [v1 v2 · · · vk ],Corresponding singular values �1,�2, . . . ,�k

Matrix of k left singular vectors Uk = [u1 u2 · · · uk ]

Let Dk be the k ⇥ k diagonal matrix with entries �1,�2, . . . ,�k

Then UkDkV>k is the best fit rank-k approximation of M

In other words, by truncating the SVD, we can focus on k most significant featuresimplicit in M

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 9 / 14

Page 38: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Rank-k approximation

M has rank r , SVD gives rank r decomposition

Singular values are non-increasing — �1 � �2 � · · · � �r

Suppose we retain only k largest ones

We have

Matrix of first k right singular vectors Vk = [v1 v2 · · · vk ],Corresponding singular values �1,�2, . . . ,�k

Matrix of k left singular vectors Uk = [u1 u2 · · · uk ]

Let Dk be the k ⇥ k diagonal matrix with entries �1,�2, . . . ,�k

Then UkDkV>k is the best fit rank-k approximation of M

In other words, by truncating the SVD, we can focus on k most significant featuresimplicit in M

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 9 / 14

Page 39: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

M--VDVT

mt.FI#-Dt-I-.-.I-vmInxdnD

Page 40: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Rank-k approximation

M has rank r , SVD gives rank r decomposition

Singular values are non-increasing — �1 � �2 � · · · � �r

Suppose we retain only k largest ones

We have

Matrix of first k right singular vectors Vk = [v1 v2 · · · vk ],Corresponding singular values �1,�2, . . . ,�k

Matrix of k left singular vectors Uk = [u1 u2 · · · uk ]

Let Dk be the k ⇥ k diagonal matrix with entries �1,�2, . . . ,�k

Then UkDkV>k is the best fit rank-k approximation of M

In other words, by truncating the SVD, we can focus on k most significant featuresimplicit in M

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 9 / 14

Page 41: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

PCA and variance

Interpret PCA in terms of preserving variance

Di↵erent projections have di↵erent variance

SVD orders projections in decreasing order ofvariance

Criterion for choosing when to stop

Choose k so that a desired fraction of thevariance is “explained”

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 10 / 14

Page 42: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

PCA and variance

Interpret PCA in terms of preserving variance

Di↵erent projections have di↵erent variance

SVD orders projections in decreasing order ofvariance

Criterion for choosing when to stop

Choose k so that a desired fraction of thevariance is “explained”

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 10 / 14

Page 43: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

PCA and variance

Interpret PCA in terms of preserving variance

Di↵erent projections have di↵erent variance

SVD orders projections in decreasing order ofvariance

Criterion for choosing when to stop

Choose k so that a desired fraction of thevariance is “explained”

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 10 / 14

Page 44: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

PCA and variance

Interpret PCA in terms of preserving variance

Di↵erent projections have di↵erent variance

SVD orders projections in decreasing order ofvariance

Criterion for choosing when to stop

Choose k so that a desired fraction of thevariance is “explained”

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 10 / 14

Page 45: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Manifold learning

Projection may not always help

Swiss roll dataset

Projection onto 2 dimesions is not useful

Better to unroll the image

Discover the manifold along which the data lies

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 11 / 14

Page 46: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Manifold learning

Projection may not always help

Swiss roll dataset

Projection onto 2 dimesions is not useful

Better to unroll the image

Discover the manifold along which the data lies

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 11 / 14

Page 47: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Manifold learning

Projection may not always help

Swiss roll dataset

Projection onto 2 dimesions is not useful

Better to unroll the image

Discover the manifold along which the data lies

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 11 / 14

Page 48: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Manifold learning

Projection may not always help

Swiss roll dataset

Projection onto 2 dimesions is not useful

Better to unroll the image

Discover the manifold along which the data lies

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 11 / 14

Page 49: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel

Manifold learning

Projection may not always help

Swiss roll dataset

Projection onto 2 dimesions is not useful

Better to unroll the image

Discover the manifold along which the data lies

Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 11 / 14