a powerpoint presentation online principal component analysis · 2019. 11. 26. · a powerpoint...
TRANSCRIPT
A PowerPo in t P resen ta t i on
PRESENTED BY Firstname Lastname⎪ August 25, 2013
On l i ne P r inc ipa l Componen t Ana lys i s B o u t s i d i s , G a r b e r , K a r n i n , L i b e r t y
PRESENTED BY Zohar Karnin⎪ November 23, 2014
Data Matrix
2 Yahoo labs
§ Often, data is represented as a huge matrix
§ Sometimes, we can’t store the entire matrix
Principal Component Analysis
3 Yahoo labs
§ Often, we require a low rank approximation of matrix A › Recommender systems, images, LSA, …
§ The approximation is used to save space and often, clean up noise
A = + + +
Column by Column Stream
4 Yahoo labs
§ Data arrives column by column § column=item and we’re seeing the items one at a time
The Formal Stream Setup
5 Yahoo labs
§ Observe x1 2 Rd, output y1 2 Rk
The Formal Stream Setup
6 Yahoo labs
§ Observe x1 2 Rd, output y1 2 Rk
§ …
The Formal Stream Setup
7 Yahoo labs
§ Observe x1 2 Rd, output y1 2 Rk
§ … § Observe xt 2 Rd,
output yt 2 Rk
The Formal Stream Setup
8 Yahoo labs
Cost = Min © ∑t kxt – ©ytk2
s.t © = embedding
from Rk to Rd
kyi-yjk=k©yi-©yjk
X
Y
The Cost Function
9 Yahoo labs
Y
X
Output
Input
The Cost Function
10 Yahoo labs
-
Y
©Y X
Embedding of Y into the same space of X
The Cost Function
11 Yahoo labs
-
=
Y
©Y X
R=X-©Y Error matrix
The Cost Function
12 Yahoo labs
-
=
Frob Error = kRkF2 = ∑ij (Xij - ©Yij) = MSE
Y
©Y X
R=X-©Y Error matrix
The Cost Function
13 Yahoo labs
-
=
Frob Error = kRkF2 = ∑ij (Xij - ©Yij) = MSE
Spectral Error = kRk2 = maxkvk=1 kv>X – v>(©Y)k
Y
©Y X
R=X-©Y Error matrix
Secondary Costs: Computational Resources
14 Yahoo labs
§ Run time: #operations required per observed column § Memory
Previous Works
15 Yahoo labs
§ Regret Minimization Setting [WK 07], [NKW 13]
§ At time t, before observing xt, predict Ut, a projection matrix onto a k dim subspace. The loss is kxt-Utxtk2
§ Each Ut can be completely different
Previous Works
16 Yahoo labs
§ Regret Minimization Setting [WK 07], [NKW 13]
§ At time t, before observing xt, predict Ut, a projection matrix onto a k dim subspace. The loss is kxt-Utxtk2
§ Each Ut can be completely different
§ Stochastic setting [ACS 13], [MCJ 13], [BDF 13] › xt are drawn i.i.d from some distribution. Objective: find U as quickly as possible
minimizing E[ kxt-Uxtk2 ]
Previous Works
17 Yahoo labs
§ Regret Minimization Setting [WK 07], [NKW 13]
§ At time t, before observing xt, predict Ut, a projection matrix onto a k dim subspace. The loss is kxt-Utxtk2
§ Each Ut can be completely different
§ Stochastic setting [ACS 13], [MCJ 13], [BDF 13] › xt are drawn i.i.d from some distribution. Objective: find U as quickly as possible
minimizing E[ kxt-Uxtk2 ]
§ Reconstruction matrix (not an embedding) [CW 09] › min© ∑t kxt – ©ytk2 s.t © is an arbitrary linear transformation from Rk to Rd
Results
18 Yahoo labs
§ X = d £ n matrix whose columns are observed
Results
19 Yahoo labs
§ X = d £ n matrix whose columns are observed § k << d
Results
20 Yahoo labs
§ X = d £ n matrix whose columns are observed § k << d § Xk = Best rank k approximation of X (top k directions)
Results
21 Yahoo labs
§ X = d £ n matrix whose columns are observed § k << d § Xk = Best rank k approximation of X (top k directions) § OPT = kX-XkkF
2
Results
22 Yahoo labs
§ X = d £ n matrix whose columns are observed § k << d § Xk = Best rank k approximation of X (top k directions) § OPT = kX-XkkF
2 § Theorem 1: Given kXkF, k, ²: Error = OPT + ²kXkF
2
› Memory, Target dimension, Processing time per column = O(k/²2)
Results
23 Yahoo labs
§ X = d £ n matrix whose columns are observed § k << d § Xk = Best rank k approximation of X (top k directions) § OPT = kX-XkkF
2 § Theorem 1: Given kXkF, k, ²: Error = OPT + ²kXkF
2
› Memory, Target dimension, Processing time per column = O(k/²2)
§ Theorem 2: Given k, ²: Error = OPT + ²kXkF2
› Memory, Target dimension, Processing time per column = O(k/²3)
The “Operator Norm” Cost Function
24 Yahoo labs
§ Y = output matrix [y1,…,yn]
§ Cost = kX – ©YkF2
› Interpretation: Mean square error
kX – XkkF2 ¿ kXkkF
2 noise signal
The “Operator Norm” Cost Function
25 Yahoo labs
§ Y = output matrix [y1,…,yn]
§ Cost = kX – ©YkF2
› Interpretation: Mean square error
kX – XkkF2 ¿ kXkkF
2 kX – XkkF2 ÀkXkkF
2
…
but… kX – Xkk2 ¿ kXkk2
noise signal
The “Operator Norm” Cost Function
26 Yahoo labs
§ Y = output matrix [y1,…,yn]
§ Cost = kX – ©YkF2
› Interpretation: Mean square error
§ Alternative cost: kX – ©Yk2 › Interpretation: bounds max unit vector v, kv>X – v>©Yk
kX – XkkF2 ¿ kXkkF
2 kX – XkkF2 ÀkXkkF
2
…
but… kX – Xkk2 ¿ kXkk2
noise signal
Results
27 Yahoo labs
§ Theorem 3 [under construction] : Given kXk, kX-Xkk, k, ²: Operator Norm Error = OPToperator + ²kXk2
› Target dimension = O(k/²)
Algorithm
28 Yahoo labs
§ Maintain U:Rd → R`
§ Directions are only added, never removed (for now)
• r = Tolerable error radius = kXkF
/ `1/2
Algorithm
29 Yahoo labs
§ Maintain U:Rd → R`
§ Directions are only added, never removed (for now)
• r = Tolerable error radius = kXkF
/ `1/2
Algorithm
30 Yahoo labs
§ Maintain U:Rd → R`
§ Directions are only added, never removed (for now)
• r = Tolerable error radius = kXkF
/ `1/2 • “Error ellipsoid”
Algorithm
31 Yahoo labs
§ Maintain U:Rd → R`
§ Directions are only added, never removed (for now)
• r = Tolerable error radius = kXkF
/ `1/2 • “Error ellipsoid”
Algorithm
32 Yahoo labs
§ Maintain U:Rd → R`
§ Directions are only added, never removed (for now)
• r = Tolerable error radius = kXkF
/ `1/2 • “Error ellipsoid”
Algorithm
33 Yahoo labs
§ Maintain U:Rd → R`
§ Directions are only added, never removed (for now)
• r = Tolerable error radius = kXkF
/ `1/2 • “Error ellipsoid”
Algorithm
34 Yahoo labs
§ Maintain U:Rd → R`
§ Directions are only added, never removed (for now)
• r = Tolerable error radius = kXkF
/ `1/2 • “Error ellipsoid”
Algorithm
35 Yahoo labs
§ Maintain U:Rd → R`
§ Directions are only added, never removed (for now)
• r = Tolerable error radius = kXkF
/ `1/2 • “Error ellipsoid” • Add vector u1 to U
Algorithm
36 Yahoo labs
§ Maintain U:Rd → R`
§ Directions are only added, never removed (for now)
• r = Tolerable error radius = kXkF
/ `1/2 • “Error ellipsoid” • Add vector u1 to U
Analysis: Target Dimension
37 Yahoo labs
• r = Tolerable error radius = kXkF / `1/2
Target dimension = number of vectors added to U
Analysis: Target Dimension
38 Yahoo labs
• r = Tolerable error radius = kXkF / `1/2
Target dimension = number of vectors added to U Obs: adding a vector to U means requires kXkF
2 / ` weight from kXkF
2
Analysis: Target Dimension
39 Yahoo labs
• r = Tolerable error radius = kXkF / `1/2
Target dimension = number of vectors added to U Obs: adding a vector to U means requires kXkF
2 / ` weight from kXkF
2 ) number of vectors added to U · `
Analysis: Cost
40 Yahoo labs
• “Error ellipsoid” • Y = output matrix • R = error matrix = X-Un
>Y Operator norm cost = kRk2 = max{r1
2,r22}
Cost = kRkF2 = r1
2+r22
r1
r2
Analysis: Cost
41 Yahoo labs
• r = Tolerable error radius = kXkF / `1/2
• “Error ellipsoid” • Y = output matrix • R = error matrix = X-Un
>Y Statements: • kRk2 · r2 = kXkF
2 / ` • kRkF
2 · loss from Xk + loss from X-Xk · kXkF
2 (k/`)1/2 + kX-XkkF2
Implementation: Memory and Run-time Complexity
42 Yahoo labs
rt = xt – Ut xt
R = [r1, r2, …, rt]
Implementation: Memory and Run-time Complexity
43 Yahoo labs
rt = xt – Ut xt
R = [r1, r2, …, rt] § Straightforward version requires maintaining RR>
› Update time, memory requirements = d2
Implementation: Memory and Run-time Complexity
44 Yahoo labs
rt = xt – Ut xt
R = [r1, r2, …, rt] § Straightforward version requires maintaining RR>
› Update time, memory requirements = d2
§ Instead: Maintain Z: d£` matrix such that ZZ> ¼ RR>
§ kZZ>- RR>k< kRkF2/`
§ [Lib 12] Update time, memory requirements = d`
Implementation: Unknown Horizon
45 Yahoo labs
Error radius parameter = kXkF / `1/2
Implementation: Unknown Horizon
46 Yahoo labs
Error radius parameter = kXkF / `1/2
§ Def: Xt = [x1,…,xt] § Idea: use growing radius parameter kXtkF
/ `1/2
Implementation: Unknown Horizon
47 Yahoo labs
Error radius parameter = kXkF / `1/2
§ Def: Xt = [x1,…,xt] § Idea: use growing radius parameter kXtkF
/ `1/2 § Thm: works as before, but target dimension =
`·log(n)
Implementation: Unknown Horizon
48 Yahoo labs
Error radius parameter = kXkF / `1/2
§ Def: Xt = [x1,…,xt] § Idea: use growing radius parameter kXtkF
/ `1/2 § Thm: works as before, but target dimension =
`·log(n)
Implementation: Unknown Horizon
49 Yahoo labs
Error radius parameter = kXkF / `1/2
§ Def: Xt = [x1,…,xt] § Idea: use growing radius parameter kXtkF
/ `1/2 § Thm: works as before, but target dimension =
`·log(n)
Implementation: Unknown Horizon
50 Yahoo labs
Error radius parameter = kXkF / `1/2
§ Def: Xt = [x1,…,xt] § Idea: use growing radius parameter kXtkF
/ `1/2 § Thm: works as before, but target dimension =
`·log(n)
Implementation: Unknown Horizon
51 Yahoo labs
Error radius parameter = kXkF / `1/2
§ Def: Xt = [x1,…,xt] § Idea: use growing radius parameter kXtkF
/ `1/2 § Thm: works as before, but target dimension =
`·log(n) › Divide time into epochs, in each epoch, N · kXtkF
· 2N › At most ` directions are added in each epoch
Implementation: Unknown Horizon
52 Yahoo labs
Error radius parameter = kXkF / `1/2
§ Def: Xt = [x1,…,xt] § Idea: use growing radius parameter kXtkF
/ `1/2 § Thm: works as before, but target dimension =
`·log(n) › Divide time into epochs, in each epoch, N · kXtkF
· 2N › At most ` directions are added in each epoch
§ Idea 2: if direction u becomes weak (ku>Xtk¿ kXtkF / `1/2) remove it
§ Thm: works as before, target dimension = ` / ²
Conclusions and Open Questions
53 Yahoo labs
§ We obtain error = OPT + ²kXkF2 with target dimension O(k/²3). Can we
reduce the dependence on ²? § Improve to OPT(1+²) ? § Lower bound? (currently same for arbitrary reconstruction matrix) § Obtain approximation of OPT + ²kX-Xkk2
Thank you!
54 Yahoo labs