tensor decompositions and applications · the canonical tensor decomposition and its applications...
TRANSCRIPT
The Canonical Tensor Decomposition and The Canonical Tensor Decomposition and Its Applications to Social Network AnalysisIts Applications to Social Network Analysis
Evrim Acar, Tamara G. Kolda and Daniel M. DunlavySandia National Labs
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
+…+=
CANDECOMP/PARAFAC (CP) model [Hitchcock’27, Harshman’70, Carroll & Chang’70]
II
KK
R components
What is Canonical Tensor What is Canonical Tensor Decomposition?Decomposition?
JJ
CP Application: NeuroscienceCP Application: Neuroscience
Tim
esa
mpl
es
Scales Channels
+a1
b1
c1
a2
b2
c2
≈
Epileptic Seizure Localization:
CP Application: NeuroscienceCP Application: Neuroscience
Tim
esa
mpl
es
Scales Channels
+a1
b1
c1
a2
b2
c2
≈
Epileptic Seizure Localization:
Acar et al, 2007, De Vos et al, 2007
CP has Numerous Applications!CP has Numerous Applications!
• Chemometrics– Fluorescence Spectroscopy– Chromatographic Data
Analysis• Neuroscience
– Epileptic Seizure Localization– Analysis of EEG and ERP
• Signal Processing• Computer Vision
– Image compression, classification
– Texture analysis• Social Network Analysis
– Web link analysis– Conversation detection in
emails– Text analysis
• Approximation of PDEs
Sidiropoulos, Giannakis and Bro, IEEE Trans. Signal Processing, 2000.
Mørup, Hansen and Arnfred, Journal of Neuroscience
Methods, 2007.
Hazan, Polak and Shashua, ICCV 2005.
Bader, Berry, Browne, Survey of Text Mining: Clustering, Classification, and Retrieval, 2nd Ed., 2007.
Doostan and Iaccarino, Journal of Computational Physics, 2009.
Andersen and Bro, Journal of Chemometrics, 2003.
Algorithms: Algorithms: How Can We Compute CP?How Can We Compute CP?
Mathematical Details for CPMathematical Details for CP
Unfolding(Matricization)
Columns: mode-1 fibers
Mathematical Details for CPMathematical Details for CP
Unfolding(Matricization)
Row: mode-2 fibers
Mathematical Details for CPMathematical Details for CP
Unfolding(Matricization)
+…+=
Matrix Khatri-Rao Product
Tube: mode-3 fibers
CP is a Nonlinear CP is a Nonlinear Optimization ProblemOptimization Problem
+…+=
Given tensor and R (# of components), find matrices A, B, C that solve the following problem:
where the vector x comprises the entries of A, B, and C stacked
column-wise:
Optimization Problem
II
JJKK
variablesvariables
Objective Function
Traditional Approach: CPALSTraditional Approach: CPALS
for k = 1,…
end
Alternating Algorithm
Optimization Problem
CPALS dating back to Harshman’70 and Carroll & Chang’70 solves for one factor matrix at a time.
Each step can be converted to a matrix least squares problem:
R x R matrix
I x RI x JK JK x R
I x JK JK x R
Repeat the following steps until “convergence”:
Very fast, but not always accurate.Not guaranteed to converge to a stationary point.Other issues, e.g., cannot exploit symmetry.
Traditional Approach: CPALSTraditional Approach: CPALS
Optimization Problem
Our Approach: CPOPTOur Approach: CPOPT
Unlike CPALS, CPOPT solves for all factor matrices simultaneously using a gradient based optimization.
Optimization Problem
Define the objective function:
Rewriting the Objective FunctionRewriting the Objective Function
Inner Product
Norm
Derivative of 2Derivative of 2ndnd SummandSummand
Tensor-Vector Multiplication
Analogous formulas exist for partials w.r.t. columns of B and C.
Derivative of 3Derivative of 3rdrd SummandSummand
Analogous formulas exist for partials w.r.t. columns of B and C.
Objective and GradientObjective and Gradient
+…+=Objective Function
Gradient (for r = 1,…,R)
Gradient in Matrix FormGradient in Matrix Form
+…+=
Gradient
Note that this formulation can be used to derive the ALS approach!
Objective Function
Indeterminacies of CPIndeterminacies of CP
• CP is often unique.
• However, CP has two fundamental indeterminacies
– Permutation – The components can be reordered
• Swap a1, b1, c1with a3, b3, c3
– Scaling – The vectors comprising a single rank-one factor can be scaled
• Replace a1 and b1with 2 a1 and ½ b1
+…+=
Not a big deal. Leads to multiple, but separated, minima.
This leads to a continuous space of equivalent solutions.
Adding RegularizationAdding Regularization
Objective Function
Gradient
Our methods:Our methods:CPOPT & CPOPTRCPOPT & CPOPTR
CPOPT: Apply derivative-based optimization method to the following objective function:
CPOPTR: Apply derivative-based optimization method to the following regularized objective function:
Another competing method:Another competing method:CPNLSCPNLS
CPNLS: Apply nonlinear least squares solver to the following equations:
Proposed by Paatero’97 and also Tomasi and Bro’05.
Jacobian is of size .
Experimental SetExperimental Set--UpUp[Tomasi&Bro’06]
Step 2: Construct tensor from factor matrices and add noise. All combinations of:
• Homoscedastic: 1%, 5%, 10%• Heteroscedastic: 0%, 1%, 5%
Step 3: Use algorithm to extract factors, using Rtrue and Rtrue+1 factors. Compare against factors in Step 1. 180
tensors
+= + +
R=3360 tests
20 triplets
Step 1: Generate random factor matrices A, B, C with Rtrue = 3 or 5 columns each and collinearity set to 0.5,i.e.,
Implementation DetailsImplementation Details
• All experiments were performed in MATLAB on a Linux workstation (Quad-Core Intel Xeon 2.50GHz, 9 GB RAM).
• Methods– CPALS – Alternating least squares. Used parafac_als in the Tensor Toolbox
(Bader & Kolda)– CPNLS – Nonlinear least squares. Used PARAFAC3W, which implements
Levenberg-Marquadt (necessary due to scaling ambiguity), by Tomasi and Bro.
– CPOPT – Optimization. Used routines in the Tensor Toolbox in calculation of function values and gradients. Optimization via Nonlinear Conjugate Gradient (NCG) method with Hestenes-Stiefel update, using Poblano (in-house code to be released soon).
– CPOPTR – Optimization with regularization. Same as above. (Regularization parameter = 0.02.)
CPOPT is Fast and AccurateCPOPT is Fast and Accurate
Generated 360 dense test problems (with ranks 3 and 5) and factorized with R as the correct number of components and one more than that. Total of 720 tests for each entry below.
K x K x KR = # components
O(RK3) O(RK3)O(R3K3) O(RK3)
OverfactoringOverfactoring has a significant impacthas a significant impact
CPOPT is robust to CPOPT is robust to overfactoringoverfactoring
250 300 350 400 4500
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Emission wavelength
1
Emission mode
250 300 350 400 4500
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Emission wavelength
2
Emission mode
250 300 350 400 450-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Emission wavelength
3
Emission mode
Amino (http://Amino (http://www.models.life.ku.dkwww.models.life.ku.dk/)/)
Application: Application: Link PredictionLink Prediction
20052005 20072007……
Link Prediction on Link Prediction on BibliometricBibliometric DataData
auth
ors
auth
ors
conferencesconferences
1991199119921992
……20042004
Question1: Can we use tensor decompositions to model the data and extract meaningful factors?
# of papers by ith author
at jth conf. in year k.
Question2: Can we predict who is going to publish at which conferences in future?
Components make sense! Components make sense!
auth
ors
auth
ors
conferencesconferences
≈a1
b1
c1
++
a2
b2
c2
……
aR
bR
cRyearyear
ccrrbbrraarr
1992 1994 1996 1998 2000 2002 20040
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
YearsC
oeffs
.
Time mode
0 200 400 600 800 1000 1200 1400 1600 1800-0.2
0
0.2
0.4
0.6
0.8
1
1.2Conference Mode
Conferences
Coe
ffs.
BILDMED
CARSDAGM
0 2000 4000 6000 8000 10000 12000-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3Author Mode
Authors
Coe
ffs.
Hans Peter Meinzer
Heinrich Niemann
Thomas Martin Lehmann
DBLPDBLP
Components make sense! Components make sense!
auth
ors
auth
ors
conferencesconferences
≈a1
b1
c1
++
a2
b2
c2
……
aR
bR
cR
XX
yearyear
ccrrbbrraarr
1992 1994 1996 1998 2000 2002 20040
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
YearsC
oeffs
.
Time mode
0 200 400 600 800 1000 1200 1400 1600 1800-0.2
0
0.2
0.4
0.6
0.8
1
1.2Conference Mode
Conferences
Coe
ffs.
BILDMED
CARSDAGM
0 2000 4000 6000 8000 10000 12000-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3Author Mode
Authors
Coe
ffs.
Hans Peter Meinzer
Heinrich Niemann
Thomas Martin Lehmann
Components make sense! Components make sense!
auth
ors
auth
ors
conferencesconferences
≈a1
b1
c1
++
a2
b2
c2
……
aR
bR
cRyearyear
ccrrbbrraarr
1992 1994 1996 1998 2000 2002 2004-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
YearsC
oeffs
.
Time mode
0 200 400 600 800 1000 1200 1400 1600 1800-0.2
0
0.2
0.4
0.6
0.8
1
1.2Conference mode
Conferences
Coe
ffs.
0 2000 4000 6000 8000 10000 12000-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16Author mode
Coe
ffs.
Authors
IJCAI
Craig Boutilier
Daphne Koller
Components make sense! Components make sense!
auth
ors
auth
ors
conferencesconferences
≈a1
b1
c1
++
a2
b2
c2
……
aR
bR
cRyearyear
ccrrbbrraarr
1992 1994 1996 1998 2000 2002 2004-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
YearsC
oeffs
.
Time mode
0 200 400 600 800 1000 1200 1400 1600 1800-0.2
0
0.2
0.4
0.6
0.8
1
1.2Conference mode
Conferences
Coe
ffs.
0 2000 4000 6000 8000 10000 12000-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16Author mode
Coe
ffs.
Authors
IJCAI
Craig Boutilier
Daphne Koller
Link Prediction ProblemLink Prediction Problem
TRAIN:TRAIN:
TEST:TEST:
a1
b1
c1
++
a2
b2
c2
……
aR
bR
cR
auth
ors
auth
ors
conferencesconferences
1991199119921992
20042004……
≈≈au
thor
sau
thor
s
conferencesconferences
2005200520062006
20072007au
thor
sau
thor
s
conferencesconferences
~ 60K links out of 19 million possible <author, conf> pairs
~ 0.3% dense~ 32K previously unseen links in the training set
<authori, confj> = 1 if ith author publishes at jth conf.
<authori, confj> = 0
Score for <Score for <authorauthorii, , confconfjj>>
a1b1
• Fix signs using the signs of the maximum magnitude entries and then compute a score for each author-conference pair using the information from the time domain:
• Sign ambiguity:
a1
b1
a2
b2++ ……
aR
bR
0 2 4 6 8 10 12 14-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7 cc11
timett
a1
b1
c1
++
a2
b2
c2
≈≈
0 2 4 6 8 10 12 14-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45 cc22
Score for <Score for <authorauthorii, , confconfjj>>
a1b1
• Fix signs using the signs of the maximum magnitude entries and then compute a score for each author-conference pair using the information from the time domain:
• Sign ambiguity:
a1
b1
a2
b2++ ……
aR
bR
tt
a1
b1
c1
++
a2
b2
c2
≈≈
Performance Measure: AUCPerformance Measure: AUC
s: contains the scores for all possible pairs, e.g., ~19 million
11
12
....
....ij
IJ
ss
s
s
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
95
23
67
....
....
....
ss
s
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
sortsort
10....1....0
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
labels
conferencesconferences
sorted scoresscores auth
ors
auth
ors
<authori, confj> = 1 if ith author publishes at jth conf.
<authori, confj> = 0
N: number of 1’sM: number of 0’s
Performance Measure: AUCPerformance Measure: AUC
s: contains the scores for all possible pairs, e.g., ~19 million
11
12
....
....ij
IJ
ss
s
s
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
95
23
67
....
....
....
ss
s
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
sortsort
10....1....0
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
sorted scoresscores labels TP rate FP rate
1/N 0
N: number of 1’sM: number of 0’s
1/N 1/M
1 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FP rateT
P ra
te
Receiver Operating Characteristic (ROC)
Curve
Area Under the curve(AUC)
Performance EvaluationPerformance Evaluation
Predicting Links for 2005 - 2007 (~ 60K):
Predicting Previously Unseen LinksPreviously Unseen Linksfor 2005 - 2007(~ 32K):
AUC=0.92AUC=0.92
AUC=0.87AUC=0.87
CPCP
RANDOMRANDOM
CPCP--WOPT: WOPT: Handling Missing DataHandling Missing Data
EEG
chan
nels
chan
nels
timetime--frequencyfrequency
subject 1subject 1
Missing Data ExamplesMissing Data Examples
Missing data in different disciplines due to lossof information, machine failures, different samplingfrequencies or experimental-set ups.• Chemometrics • Biomedical signal processing (e.g., EEG)• Network traffic analysis (e.g., packet drops)• Computer vision (e.g., occlusions)• … excitationexcitation
emissionemission
Tomasi&Bro’05CHEMISTRY
subject Nsubject N subjectssubjects
chan
nels
chan
nels
timetime--frequencyfrequency
+…+≈≈
Modify the objective for CPModify the objective for CP
Optimization ProblemOptimization Problem
Objective Function
NO MISSING DATAFOR HANDLING MISSING DATA
Our approach: CPOur approach: CP--WOPTWOPT
Objective Function
Objective and GradientObjective and Gradient
Objective Function
Gradient (for r = 1,…,R; i=1,…I; j=1,..J; k=1,..K )
Gradient in Matrix FormGradient in Matrix Form
Objective Function
Gradient
Experimental SetExperimental Set--UpUp[Tomasi&Bro’05]
Step 1: Generate random factor matrices A, B, C with R = 5 or 10 columns each and collinearity set to 0.5.
Step 2: Construct tensor from factor matrices and add noise ( 2% homoscedastic noise)
Step 4: Use algorithm to extract R factors. Compare against factors in Step 1.
+ …= +
R
20 triplets
Step 3: Set some entries to missing• Percentage of Missing Data: 10%, 40%,
70%
Missing: entries, fibers
CPCP--WOPT is Accurate!WOPT is Accurate!
Generated 40 test problems (with ranks 5 and 10) and factorized with an R-component CP model. Each entry corresponds to the percentage of correctly recovered solutions.
# known data entries# variables
CPCP--WOPT is Accurate!WOPT is Accurate!
Generated 40 test problems (with ranks 5 and 10) and factorized with an R-component CP model. Each entry corresponds to the percentage of correctly recovered solutions.
CPNLS : Nonlinear least squares. Used INDAFAC, which implements Levenberg-Marquadt [Tomasi and Bro’05].Other alternative: ALS-based imputation (For comparisons, see Tomasi and Bro’05).
CPCP--WOPT is Fast!WOPT is Fast!
Generated 60 test problems (with M =10%, 40% and 70%) and factorized with an R-component CP model. Each entry corresponds to the average/std of the CP models, which successfully recover the underlying factors.
CPCP--WOPT is useful for real data!WOPT is useful for real data!
GOAL: To differentiate between left and right hand stimulation
subjectssubjects
chan
nels
chan
nels
timetime--frequencyfrequency
≈≈ + +
COMPLETE DATACOMPLETE DATA INCOMPLETE DATAINCOMPLETE DATA
Thanks to Morten Mørup!
missing
Summary & Future WorkSummary & Future Work
• New CPOPT method – Accurate & scalable
• Extend CPOPT to CP-WOPT tohandle missing data– Accurate & scalable
• More open questions…– Starting point?– Tuning the optimization– Regularization – Exploiting sparsity– Nonnegativity
• Application to link prediction– On-going work comparing to other
methods
Thank you!Thank you!
• More on tensors and tensor models:– Survey : E. Acar and B. Yener, Unsupervised Multiway Data Analysis: A Literature Survey,
IEEE Transactions on Knowledge and Data Engineering, 21(1): 6-20, 2009.– CPOPT : E. Acar, T. G. Kolda and D. M. Dunlavy, An Optimization Approach for Fitting
Canonical Tensor Decompositions, Submitted for publication.– CP-WOPT : E. Acar, T.G. Kolda, D. M. Dunlavy and M. Mørup, Tensor Factorizations with
Missing Data, Submitted for publication.– Link Prediction: E. Acar, T.G. Kolda and D. M. Dunlavy, Link Prediction on Evolving Data, in
preparation.
• Contact:– Evrim Acar, [email protected]– Tamara G. Kolda, [email protected]– Daniel M. Dunlavy, [email protected]
Minisymposia onTensors and Tensor-based Computations