completion of missing entries in matrices and...

34
Completion of missing entries in matrices and tensors Shmuel Friedland Univ. Illinois at Chicago Department of Statistics, The University of Chicago, September 12, 2011 Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensors Department of Statistics, The University of Chi / 34

Upload: others

Post on 15-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Completion of missing entries in matrices andtensors

Shmuel FriedlandUniv. Illinois at Chicago

Department of Statistics, The University of Chicago,September 12, 2011

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 1

/ 34

Page 2: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Overview

IntroductionDNA MicorarraysRecovery methods of missing entries in DNA Micorarrays

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 2

/ 34

Page 3: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Introduction

In many instances in measuring multidimensional data, as matricesand tensors, one confronts the following problems: noisy data, missingentries and data reduction.There are many statistical and mathematical methods to deal withthese problems. In this talk we survey some of the known methodsand expand on the methods that the speaker was working on.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 3

/ 34

Page 4: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

DNA Microarrays: I

A DNA microarray (also commonly known as gene chip, DNA chip, orbiochip) is a collection of microscopic DNA spots attached to a solidsurface. Scientists use DNA microarrays to measure the expressionlevels of large numbers of genes simultaneously or to genotypemultiple regions of a genome. Each DNA spot contains picomoles(10-12 moles) of a specific DNA sequence, known as probes (orreporters). These can be a short section of a gene or other DNAelement that are used to hybridize a cDNA or cRNA sample (calledtarget) under high-stringency conditions. Probe-target hybridization isusually detected and quantified by detection of fluorophore-, silver-, orchemiluminescence-labeled targets to determine relative abundance ofnucleic acid sequences in the target. Since an array can contain tensof thousands of probes, a microarray experiment can accomplish manygenetic tests in parallel. Therefore arrays have dramaticallyaccelerated many types of investigation.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 4

/ 34

Page 5: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

DNA Microarrays: II

In standard microarrays, the probes are synthesized and then attachedvia surface engineering to a solid surface by a covalent bond to achemical matrix (via epoxy-silane, amino-silane, lysine, polyacrylamideor others). The solid surface can be glass or a silicon chip, in whichcase they are colloquially known as an Affy chip when an Affymetrixchip is used. Other microarray platforms, such as Illumina, usemicroscopic beads, instead of the large solid support. Alternatively,microarrays can be constructed by the direct synthesis ofoligonucleotide probes on solid surfaces. DNA arrays are different fromother types of microarray only in that they either measure DNA or useDNA as part of its detection system.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 5

/ 34

Page 6: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

DNA Microarrays: III

DNA microarrays can be used to measure changes in expressionlevels, to detect single nucleotide polymorphisms (SNPs), or togenotype or resequence mutant genomes (see uses and typessection). Microarrays also differ in fabrication, workings, accuracy,efficiency, and cost (see fabrication section). Additional factors formicroarray experiments are the experimental design and the methodsof analyzing the data (see Bioinformatics section).

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 6

/ 34

Page 7: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

DNA Microarrays: IV

Figure: Microarays raw data

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 7

/ 34

Page 8: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

DNA Microarrays: V

Figure: Microarays processed data

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 8

/ 34

Page 9: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Missing entries in DNA Microarrays

During the laboratory process, some spots on the array may bemissing due to various factors (for example, machine error.) Because itis often very costly or time consuming to repeat the experiment,molecular biologists, statisticians, and computer scientists have madeattempts to recover the missing gene expressions by some ad-hoc andsystematic methods.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 9

/ 34

Page 10: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Gene expression matrix

E =

g11 g12 . . . g1mg21 g22 . . . g2m

......

......

gj1 gj2 . . . gjm...

......

...gn1 gn2 . . . gnm

=

g>1g>2...

g>j...

g>n

= [c1 c2 . . . cm] ∈ Rn×m

g>j := (gj1,gj2, ...,gjm), j = 1, ...,n,ci =

g1ig2i...

gji...

gni

, i = 1, ...,m.

g>j relative expression levels of j th gene in m experiments.ci relative expression levels of n genes ini th experimentn m

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 10

/ 34

Page 11: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Missing entries problem in DNA Microarrays

N ⊂ [n] := 1, . . . ,n the set of rows of E that contain at least onemissing entry.

For each j ∈ N c := [n]\N , the gene g>j has all of its entries.

n′ denote the size of N c , i.e. the size of N is n − n′.

Problem: complete the missing entries of each g>j , j ∈ N ,

under some assumptions.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 11

/ 34

Page 12: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Common methods of recovery

Zero replacement method;Row sum mean;Baesian Principal Component Analysis; [1]Clustering analysis methods such as K-nearest neighborclustering, hierarchical clustering [2], KNNimpute, - [2].FRAA and IFRAA; [4, 3]Least square imputation methods; [1];Local least squares imputation method (LLS) [2];Projection onto convex sets methods (POCS) [1]SVDimpute - Singular Value Decomposition (which is closelyrelated to Principal Component Analysis) [2]

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 12

/ 34

Page 13: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Short descriptions of KNNimpute and LLS

KNNimpute and LLS are local methods, which use similarity structureof the data to impute the missing values

KNNimpute uses the weighted averages of the K -nearest uncorruptedneighbors.

LLS has two versions to find similar genes whose expressions are notcorrupted: the L2-norm and the Pearson’s correlation coefficients.After a group of similar genes C are identified, the missing values ofthe gene are obtained using least squares applied to the group C.

In these two methods, the recovery of missing data is doneindependently, i.e. the estimation of each missing entry does notinfluence the estimation of the other missing entries.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 13

/ 34

Page 14: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Short description of BPCA

BPCA is a global method consisting of three components. First,principal component regression, which is basically a low rankapproximation of the data set is performed. Second, Bayesianestimation, which assumes that the residual error and the projection ofeach gene on principal components behave as normal independentrandom variables with unknown parameters, is carried out. Third,Bayesian estimation follows by iterations based on theexpectation-maximization (EM) of the unknown Bayesian parameters.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 14

/ 34

Page 15: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Short descriptions of SVDimpute, FRAA and IFRAA

These methods are intimately related to Singular Value Decomposition

SVD

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 15

/ 34

Page 16: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Singular Value Decomposition - SVD

A = UΣV> ∈ Rn×m

Σ = diag(σ1, . . . , σmin(m,n)) :=

σ1 0 ... 00 σ2 ... 0...

......

...0 0 ... σn0 0 ... 0...

......

...

∈ Rn×m

σ1 ≥ . . . ≥ σr > 0 = σi , i > r = rank AU = [u1 . . .un] ∈ O(n), V = [v1 . . . vm] ∈ O(m)

a† = a−1 if a 6= 0, a† = 0 if a = 0

A† := V diag(σ†1, . . . , σ†min(m,n))U>

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 16

/ 34

Page 17: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Best rank k -approximation

For k ≤ r = rank A: Σk = diag(σ1, . . . , σk ) ∈ Rk×k ,Uk = [u1 . . .uk ] ∈ Rm×k ,Vk = [v1 . . . vk ] ∈ Rn×k

Ak := Uk ΣkV>k is the best rank k approximation in Frobenius andoperator norm of A

minB∈R(m,n,k)

||A− B||F = ||A− Ak ||F (‖A‖2F = tr(AA>)).

Reduced SVD A = Ur Σr V>r where (r ≥) ν numerical rank of A if∑i≥ν+1 σ

2i∑

i≥1 σ2i≈ 0, (0.01).

Aν is a noise reduction of A. Noise reduction has many applications inimage processing, DNA-Microarrays analysis, data compression.Full SVD: O(mn min(m,n)), k - SVD: O(kmn).

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 17

/ 34

Page 18: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Minimal characterization of sum of squares of singularvalues

σ21(A) ≥ σ2(A)2 ≥ . . . are the eigenvalues of AA> and A>A.

Ky-Fan characterization

m∑i=ν+1

σi(A)2 = min[xν+1...xm]∈O(m,m−ν)

m∑i=ν+1

(Axi)>(Axi)

O(m, k) ⊂ Rm×k all matrices with k orthonormal columns

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 18

/ 34

Page 19: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

SVDimpute [2, 2]

First, replace the missing values with 0 or with values computed fromanother method. Call the estimated matrix Ep, where p = 0.

Find the lp significant singular values of Ep, and let Ep,lp be the filteredpart of Ep. Replace the missing values in E by the correspondingvalues in Ep,lp to obtain the matrix Ep+1.

Continue this process until Ep converges to a fixed matrix (within agiven precision). This algorithm takes into account implicitly theinfluence of the estimation of one entry on the other ones. But it is notclear if the algorithm converges, nor what are the features of any fixedpoint(s) of this algorithm.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 19

/ 34

Page 20: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Fixed Rank Approximation Algorithm (FRAA): I

Ω ⊂ 1, . . . ,n × 1, . . . ,m missing entries set.

Set gij = 0 if (i , j) ∈ Ω to obtain E ∈ Rn×m.

X are all X = [xij ] ∈ Rn×m where xij = 0 if (i , j) 6∈ Ω.

Assume that the completed matrix of the experiment should have thenumerical rank ν. Then we complete the entries by solving theproblem:

(1) minX∈X

m∑i=ν+1

σ2i (E + X ) = min

X∈X

m∑i=ν+1

λi((E + X )>(E + X ))

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 20

/ 34

Page 21: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

FRAA: II

Fixed Rank Approximation Algorithm: [4]

Gp ∈ X is pth approximation to a solution of optimization problem (1).

Let Bp := (E + Gp)>(E + Gp)

Find an orthonormal set of eigenvectors for Bp, vp,1, ...,vp,m.

Then Gp+1 is a solution to the following minimumof a convex nonnegative quadratic function

minX∈X

m∑q=l+1

((E + X )vp,q)>((E + X )vp,q)

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 21

/ 34

Page 22: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

FRAA: III

Flow chart of the algorithm:Fixed Rank Approximation Algorithm(FRAA)Input: integers m,n,L, iter , the locations ofnon-missing entries S, initial approximationG0 of n ×m matrix G.Output: an approximation Giter of G.for p = 0 to iter − 1- Compute Bp := (E + Gp)>(E + Gp) andfind an orthonormal set of eigenvectors forBp, vp,1, ...,vp,m.- Gp+1 is a solution to the minimum problem(1) withν = L− 1 = l .

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 22

/ 34

Page 23: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

FRAA: IV

Let fl(X ) :=∑n

i=ν+1 σ2i (A + X ).

Then fl(Gp) ≥ fl(Gp+1). Gp,p = 1, . . . converges to a critical point G.

FRAA gives a good approximation of G. In many simulations G = G∗.

FRAA is an adaptation of an algo for IEP:

Inverse Eigenvalue Problem:Find the values of the missing entries of G such that the nonnegativedefinite matrix G>G will have m − l smallest eigenvalues equal to zero.IEP appear often in engineering. See [5]

FRAA is a robust algorithm which performs good, but not as well asKNNimpute, BPCA and LSSimpute.All other algo reconstruct the missing values of each gene from similargenes.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 23

/ 34

Page 24: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Fixed Rank Approximation Algorithm (IFRAA)

First use FRAA to find a completion G.

Then use a cluster algorithm

(We used K-means repeating & refining cluster size),to find a reasonable number of clusters of similar genes,

each cluster is a relatively smaller matrix having an effective low rank.

For each cluster of genes apply FRAA separately to recover themissing entries in this cluster [3].

These results suggest that IFRAA has a potential for being an effectivealgorithm to recover blurred spots in digital images.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 24

/ 34

Page 25: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

SIMULATIONS 1

Figure: Comparison of NRMSE against percent of missing entries for threemethods: IFRAA, BPCA and LLS. Cdc15 data set in [?] with 24 samples.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 25

/ 34

Page 26: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

SIMULATIONS 2

Figure: Comparison of NRMSE against percent of missing entries for threemethods: IFRAA, BPCA and LLS. Data set was a 2000× 20 randomlygenerated matrix of rank 2.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 26

/ 34

Page 27: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

The performance of the BCPA, IFRAA and LLS

algorithms depends on the unknown distribution of missing position ofthe entries.

Table: Comparison of NRMSE for three methods: IFRAA, LLS and BPCA foractual missing values distribution for three gene expression data sets withdifferent percentage of missing values.

Data sets IFRAA LLS BPCACdc15 data set %0.81 missing 0.0175 0.0200 0.0216

Evolution data set %9.16 0.0703 0.0969 0.1247Calcineurin data set %3.68 0.0421 0.0445 0.0453

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 27

/ 34

Page 28: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Missing entries for 3-tensors

T = [ti,j,k ]n,m,li=j=k=1 ∈ Rn×m×l .

Ω ⊂ 1, . . . ,n × 1, . . . ,m × 1, . . . , l missing entries set

Simple solution: Assume 1, . . . ,n are genes

Unfold T in direction 1 to get the matrix E = [gi(j,k)] ∈ Rn×(ml)

where gi(j,k) = ti,j,k .

Apply your favorite completion algorithm for matrices

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 28

/ 34

Page 29: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

(p, q, r)-approximation of 3-tensors

U ⊂ Rn,V ⊂ Rm,W ⊂ Rl of dimensions p,q, r respectively

with orthonormal bases [u1, . . . ,up], [v1, . . . ,vq], [w1, . . . ,wr ]

PU⊗V⊗W(T ) =∑p,q,r

i=j=k 〈T ,ui ⊗ vj ⊗wk 〉ui ⊗ vj ⊗wk

(〈T ,x⊗ y⊗ z〉 =∑n,m,l

i=j=k=1 ti,j,kxiyjzk )

‖T ‖2HS := ‖PU⊗V⊗W(T )|2HS + ‖P(U⊗V⊗W)⊥(T )|2HS

(‖PU⊗V⊗W(T )|2HS :=∑p,q,r

i=j=k=1〈T ,ui ⊗ vj ⊗wk 〉2)

(Best) (p,q, r)-approximation PU?⊗V?⊗W?(T ):

arg max ‖PU⊗V⊗W(T )‖HS = arg min ‖P(U⊗V⊗W)⊥(T )‖HS

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 29

/ 34

Page 30: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Methods to find (p, q, r)-approximation

Higher Order Singular Value Decomposition HOSVD

Unfold in direction 1 and find p-truncated SVD approximation.U-the subspace spanned by first p-left singular vectors.Do similarly for V,W.

Alternating Least Squares Method:

Fix V0,W0, e.g. use HOSVD.Find U1 := arg maxU, ‖PU⊗V⊗W(T )‖HS.(Equivalent to finding of the first p-eigenvectors of corresponding

nonnegative definite matrix.)

Fix U1,W0 and find V1, then fix U1,V1 and find W1Continue the algorithmIn each step of the algorithm the value of ‖PU⊗V⊗W(T )‖HS increasesConvergence to a critical point, which is a semi-local maximum

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 30

/ 34

Page 31: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

Fixed Rank Approximation Algorithm for Tensors

ΦΩ ⊂ Rn×m×l all tensors X = [xi,j,k ] ∈ Rn×m×l

with xi,j,k = 0 if (i , j , k) 6∈ Ω.

T = [ti,j,k ] ∈ Rn×m×l , ti,j,k = 0 if (i , j , k) ∈ Ω.

X0 an approximation of completed errors

Assume Xs given.

Find (p,q, r)-approximation of T + Xs with corresponding subspacesUs,Vs,Ws.

Then Xs+1 := arg min‖P(Us⊗Vs⊗Ws)⊥(T + X )‖HS,X ,∈ Φ.

Xs converges to a critical semi-local maximum

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 31

/ 34

Page 32: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

References 1

T.H. Bø, B. Dysvik and I. Jonassen, LSimpute: accurate estimationof missing values in microarray data with least squares methods,Nucleic Acids Research, 32 (2004), e34

H. Chipman, T.J. Hastie and R. Tibshirani, Clustering micrarraydata in: T. Speed, (Ed.), Statistical Analysis of Gene ExpressionMicroarray Data, , Chapman & Hall/CRC, 2003 pp. 159-200.

S. Friedland, M. Kaveh, A. Niknejad, H. Zare, An algorithm formissing value estimation for DNA microarray data, Proc. ICASSP,2006.

S. Friedland, A. Niknejad and L. Chihara, A SimultaneousReconstruction of Missing Data in DNA Microarrays, LinearAlgebra Appl., 416 (2006), 8-28.

S. Friedland, J. Nocedal and M. Overton, The formulation andanalysis of numerical methods for inverse eigenvalue problems,SIAM J. Numer. Anal. 24 (1987), 634-667.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 32

/ 34

Page 33: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

References 2

X. Gan, A.W.-C. Liew and H. Yan, Missing Microaaray DataEstimation Based on Projection onto Convex Sets Method, Proc.17th International Conference on Pattern Recognition, 2004

H. Kim, G.H. Golub and H. Park, Missing value estimation for DNAmicroarray gene expression data: local least squares imputation,Bioinformatics 21 (2005), 187-198.

Amir Niknejad, Application of Singular Value Decompositions toDNA Microarrays, Ph.D. thesis, UIC, 2005,http://www2.math.uic.edu/∼friedlan/thesis9.19.05.pdf

A. Niknejad and S. Friedland, APPLICATIONS OF LINEARALGEBRA TO DNA MICROARRAYS, VDM Verlag Dr MüllerAktiengesellschaft&Co.KG, Germany, 2009, ISBN:978-3-639-17994-1

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 33

/ 34

Page 34: Completion of missing entries in matrices and tensorshomepages.math.uic.edu/~friedlan/complmattenSep11.pdf · 2011. 9. 1. · Completion of missing entries in matrices and tensors

References 3

S. Oba, M. Sato, I. Takemasa, M. Monden, K. Matsubara and S.Ishii, A Baesian missing value estimation method for geneexpression profile data, Bioinformatics 19 (2003), 2088-2096

O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie,R. Tibshirani, D. Botstein and R. Altman, Missing value estimationfor DNA microarrays, Bioinformatics 17 (2001), 520-525.

Shmuel Friedland Univ. Illinois at Chicago () Completion of missing entries in matrices and tensorsDepartment of Statistics, The University of Chicago, September 12, 2011 34

/ 34