Download - Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15
![Page 1: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/1.jpg)
Beating the Perils of Non-Convexity:Machine Learning using Tensor Methods
Anima Anandkumar
U.C. Irvine
![Page 2: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/2.jpg)
Learning with Big Data
Learning is finding needle in a haystack
![Page 3: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/3.jpg)
Learning with Big Data
Learning is finding needle in a haystack
High dimensional regime: as data grows, more variables!
Useful information: low-dimensional structures.
Learning with big data: ill-posed problem.
![Page 4: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/4.jpg)
Learning with Big Data
Learning is finding needle in a haystack
High dimensional regime: as data grows, more variables!
Useful information: low-dimensional structures.
Learning with big data: ill-posed problem.
Learning with big data: statistically and computationally challenging!
![Page 5: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/5.jpg)
Optimization for Learning
Most learning problems can be cast as optimization.
Unsupervised Learning
Clusteringk-means, hierarchical . . .
Maximum Likelihood EstimatorProbabilistic latent variable models
Supervised Learning
Optimizing a neural network withrespect to a loss function
Input
Neuron
Output
![Page 6: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/6.jpg)
Convex vs. Non-convex Optimization
Progress is only tip of the iceberg..
Images taken from https://www.facebook.com/nonconvex
![Page 7: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/7.jpg)
Convex vs. Non-convex Optimization
Progress is only tip of the iceberg.. Real world is mostly non-convex!
Images taken from https://www.facebook.com/nonconvex
![Page 8: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/8.jpg)
Convex vs. Nonconvex Optimization
Unique optimum: global/local. Multiple local optima
![Page 9: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/9.jpg)
Convex vs. Nonconvex Optimization
Unique optimum: global/local. Multiple local optima
In high dimensions possiblyexponential local optima
![Page 10: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/10.jpg)
Convex vs. Nonconvex Optimization
Unique optimum: global/local. Multiple local optima
In high dimensions possiblyexponential local optima
How to deal with non-convexity?
![Page 11: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/11.jpg)
Outline
1 Introduction
2 Spectral Methods: Matrices and Tensors
3 Applications of Tensor Methods
4 Conclusion
![Page 12: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/12.jpg)
Matrices and Tensors as Data Structures
Modern data: Multi-modal,multi-relational data.
Matrices: pairwise relations. Tensors: higher order relations.
Multi-modal data figure from Lise Getoor slides.
![Page 13: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/13.jpg)
Tensor Representation of Higher Order Moments
Matrix: Pairwise Moments
E[x⊗ x] ∈ Rd×d is a second order tensor.
E[x⊗ x]i1,i2 = E[xi1xi2 ].
For matrices: E[x⊗ x] = E[xx⊤].
Tensor: Higher order Moments
E[x⊗ x⊗ x] ∈ Rd×d×d is a third order tensor.
E[x⊗ x⊗ x]i1,i2,i3 = E[xi1xi2xi3 ].
![Page 14: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/14.jpg)
Spectral Decomposition of Tensors
M2 =∑
i
λiui ⊗ vi
= + ....
Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2
![Page 15: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/15.jpg)
Spectral Decomposition of Tensors
M2 =∑
i
λiui ⊗ vi
= + ....
Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2
M3 =∑
i
λiui ⊗ vi ⊗ wi
= + ....
Tensor M3 λ1u1 ⊗ v1 ⊗ w1 λ2u2 ⊗ v2 ⊗ w2
We have developed efficient methods to solve tensor decomposition.
![Page 16: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/16.jpg)
Tensor Decomposition Methods
SVD on Pairwise Moments toobtain W
Fast SVD via RandomProjection
+ ++ +
Dimensionality Reduction ofthird order moments using W
+ ++ +
Power method on reduced tensor T : v 7→T (I, v, v)
‖T (I, v, v)‖.
Guaranteed to converge to correct solution!
![Page 17: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/17.jpg)
Summary on Tensor Methods
= + ....
Tensor decomposition: iterative algorithm with tensor contractions.
Tensor contraction: T (W1,W2,W3), where Wi are matrices/vectors.
Strengths of tensor method
Guaranteed convergence to globally optimal solution!
Fast and accurate, orders of magnitude faster than previous methods.
Embarrassingly parallel and suited for distributed systems, e.g.Spark.
Exploit optimized BLAS operations on CPUs and GPUs.
![Page 18: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/18.jpg)
Preliminary Results on Spark
Alternating Least Squares for Tensor Decomposition.
minw,A,B,C
∥
∥
∥
∥
∥
T −
k∑
i=1
λiA(:, i) ⊗B(:, i)⊗ C(:, i)
∥
∥
∥
∥
∥
2
F
Update Rows Independently
Tensor slices
B C
worker 1
worker 2
worker i
worker k
Results on NYtimes corpus
3 ∗ 105 documents, 108 words
Spark Map-Reduce
26mins 4 hrs
https://github.com/FurongHuang/SpectralLDA-TensorSpark
![Page 19: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/19.jpg)
Outline
1 Introduction
2 Spectral Methods: Matrices and Tensors
3 Applications of Tensor Methods
4 Conclusion
![Page 20: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/20.jpg)
Tensor Methods for Learning LVMs
GMM HMM
h1 h2 h3
x1 x2 x3
ICA
h1 h2 hk
x1 x2 xd
Multiview and Topic Models
![Page 21: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/21.jpg)
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data
![Page 22: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/22.jpg)
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data → Probabilistic admixture models
![Page 23: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/23.jpg)
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data → Probabilistic admixture models → Tensor Decomposition
![Page 24: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/24.jpg)
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data → Probabilistic admixture models → Tensor Decomposition → Inference
![Page 25: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/25.jpg)
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data → Probabilistic admixture models → Tensor Decomposition → Inference
tensor decomposition → correct model
![Page 26: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/26.jpg)
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data → Probabilistic admixture models → Tensor Decomposition → Inference
tensor decomposition → correct model
Highly parallelized, CPU/GPU/Cloud implementation.
Guaranteed global convergence for Tensor Decomposition.
![Page 27: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/27.jpg)
Community Detection on Yelp and DBLP datasets
n ∼ 20, 000
Yelp
n ∼ 40, 000
DBLP
n ∼ 1 million
Tensor vs. Variational
FB YP DBLP−sub DBLP10
0
101
102
103
104
105
Datasets
Run
ning
tim
e (s
econ
ds)
VariationalTensor
![Page 28: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/28.jpg)
Tensor Methods for Training Neural Networks
Tremendous practical impactwith deep learning.
Algorithm: backpropagation.
Highly non-convex optimization
![Page 29: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/29.jpg)
Toy Example: Failure of Backpropagation
x1
x2
y=1y=−1
Labeled input samplesGoal: binary classification
σ(·) σ(·)
y
x1 x2x
w1 w2
Our method: guaranteed risk bounds for training neural networks
![Page 30: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/30.jpg)
Toy Example: Failure of Backpropagation
x1
x2
y=1y=−1
Labeled input samplesGoal: binary classification
σ(·) σ(·)
y
x1 x2x
w1 w2
Our method: guaranteed risk bounds for training neural networks
![Page 31: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/31.jpg)
Toy Example: Failure of Backpropagation
x1
x2
y=1y=−1
Labeled input samplesGoal: binary classification
σ(·) σ(·)
y
x1 x2x
w1 w2
Our method: guaranteed risk bounds for training neural networks
![Page 32: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/32.jpg)
Backpropagation vs. Our Method
Weights w2 randomly drawn and fixed
Backprop (quadratic) loss surface
−4
−3
−2
−1
0
1
2
3
4 −4−3
−2−1
01
23
4
200
250
300
350
400
450
500
550
600
650
w1(1) w1(2)
![Page 33: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/33.jpg)
Backpropagation vs. Our Method
Weights w2 randomly drawn and fixed
Backprop (quadratic) loss surface
−4
−3
−2
−1
0
1
2
3
4 −4−3
−2−1
01
23
4
200
250
300
350
400
450
500
550
600
650
w1(1) w1(2)
Loss surface for our method
−4
−3
−2
−1
0
1
2
3
4 −4−3
−2−1
01
23
4
0
20
40
60
80
100
120
140
160
180
200
w1(1) w1(2)
![Page 34: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/34.jpg)
Feature Transformation for Training Neural Networks
Feature learning: Learn φ(·) from input data.
How to use φ(·) to train neural networks?x
φ(x)
y
![Page 35: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/35.jpg)
Feature Transformation for Training Neural Networks
Feature learning: Learn φ(·) from input data.
How to use φ(·) to train neural networks?x
φ(x)
y
Invert E[φ(x)⊗ y] to Learn Neural Network Parameters
In general, E[φ(x)⊗ y] is a tensor.
What class of φ(·) useful for training neural networks?
What algorithms can invert moment tensors to obtain parameters?
![Page 36: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/36.jpg)
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
Input:x ∈ R
d
S1(x) ∈ Rd
![Page 37: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/37.jpg)
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
Input:x ∈ R
d
S1(x) ∈ Rd
![Page 38: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/38.jpg)
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
Input:x ∈ R
d
S1(x) ∈ Rd
![Page 39: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/39.jpg)
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
mth-order score function:Input:x ∈ R
d
S1(x) ∈ Rd
![Page 40: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/40.jpg)
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
mth-order score function:
Sm(x) := (−1)m∇(m)p(x)
p(x)
Input:x ∈ R
d
S1(x) ∈ Rd
![Page 41: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/41.jpg)
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
mth-order score function:
Sm(x) := (−1)m∇(m)p(x)
p(x)
Input:x ∈ R
d
S2(x) ∈ Rd×d
![Page 42: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/42.jpg)
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
mth-order score function:
Sm(x) := (−1)m∇(m)p(x)
p(x)
Input:x ∈ R
d
S3(x) ∈ Rd×d×d
![Page 43: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/43.jpg)
NN-LiFT: Neural Network LearnIng
using Feature Tensors
Input:x ∈ R
d
Score functionS3(x) ∈ R
d×d×d
![Page 44: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/44.jpg)
NN-LiFT: Neural Network LearnIng
using Feature Tensors
Input:x ∈ R
d
Score functionS3(x) ∈ R
d×d×d
1
n
n∑
i=1
yi ⊗ S3(xi) =1
n
n∑
i=1
yi⊗
Estimating M3 using
labeled data {(xi, yi)}
S3(xi)
Cross-
moment
![Page 45: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/45.jpg)
NN-LiFT: Neural Network LearnIng
using Feature Tensors
Input:x ∈ R
d
Score functionS3(x) ∈ R
d×d×d
1
n
n∑
i=1
yi ⊗ S3(xi) =1
n
n∑
i=1
yi⊗
Estimating M3 using
labeled data {(xi, yi)}
S3(xi)
Cross-
moment
Rank-1 components are the estimates of first layer weights
+ + · · ·
CP tensor
decomposition
![Page 46: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/46.jpg)
Reinforcement Learning (RL) of POMDPsPartially observable Markov decision processes.
Proposed Method
Consider memoryless policies. Episodic learning: indirect exploration.
Tensor methods: careful conditioning required for learning.
First RL method for POMDPs with logarithmic regret bounds.
xi xi+1 xi+2
yi yi+1
ri ri+1
ai ai+1
0 1000 2000 3000 4000 5000 6000 70002
2.1
2.2
2.3
2.4
2.5
2.6
2.7
Number of Trials
Ave
rage
Rew
ard
SM−UCRL−POMDPUCRL−MDPQ−learningRandom Policy
Logarithmic Regret Bounds for POMDPs using Spectral Methods by K. Azzizade, A. Lazaric, A.
, under preparation.
![Page 47: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/47.jpg)
Convolutional Tensor Decomposition
== ∗
xx
∑
f∗i w∗
i F∗ w∗
(a)Convolutional dictionary model (b)Reformulated model
.... ....= ++
Cumulant λ1(F∗1 )
⊗3 +λ2(F∗2 )
⊗3 . . .
Efficient methods for tensor decomposition with circulant constraints.
Convolutional Dictionary Learning through Tensor Factorization by F. Huang, A. , June 2015.
![Page 48: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/48.jpg)
Hierarchical Tensor Decomposition
= + .... = + .... = + ....
= + ....
Relevant for learning latent tree graphical models.
Propose first algorithm for integrated learning of hierarchy andcomponents of tensor decomposition.
Highly parallelized operations but with global consistency guarantees.
Scalable Latent Tree Model and its Application to Health Analytics By F. Huang, U. N.
Niranjan, J. Perros, R. Chen, J. Sun, A. , Preprint, June 2014.
![Page 49: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/49.jpg)
Outline
1 Introduction
2 Spectral Methods: Matrices and Tensors
3 Applications of Tensor Methods
4 Conclusion
![Page 50: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/50.jpg)
Summary and Outlook
Summary
Tensor methods: a powerful paradigm for guaranteed large-scalemachine learning.
First methods to provide provable bounds for training neuralnetworks, many latent variable models (e.g HMM, LDA), POMDPs!
![Page 51: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/51.jpg)
Summary and Outlook
Summary
Tensor methods: a powerful paradigm for guaranteed large-scalemachine learning.
First methods to provide provable bounds for training neuralnetworks, many latent variable models (e.g HMM, LDA), POMDPs!
Outlook
Training multi-layer neural networks, models with invariances,reinforcement learning using neural networks . . .
Unified framework for tractable non-convex methods with guaranteedconvergence to global optima?
![Page 52: Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15](https://reader033.vdocuments.us/reader033/viewer/2022050614/58ecb0f21a28aba9608b4583/html5/thumbnails/52.jpg)
My Research Group and Resources
Podcast/lectures/papers/software available athttp://newport.eecs.uci.edu/anandkumar/