svd example: users-to-movieslaber/svd-applications.pdf · 2017. 9. 18. · svd – example:...
TRANSCRIPT
18/09/2017
1
Single Value Decomposition
SVD – Example: Users-to-Movies
• A = U VT - example: Users to Movies
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 2
=
SciFi
Romnce
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
m
n
U
VT
“Concepts” AKA Latent dimensions AKA Latent factors
SVD – Example: Users-to-Movies
• A = U VT - example: Users to Movies
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 3
=
SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
SVD – Example: Users-to-Movies
• A = U VT - example: Users to Movies
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 4
SciFi-concept
Romance-concept
=
SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
18/09/2017
2
SVD – Example: Users-to-Movies
• A = U VT - example:
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 5
Romance-concept
U is “user-to-concept” similarity matrix
SciFi-concept
=
SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
SVD – Example: Users-to-Movies
• A = U VT - example:
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 6
SciFi
Romnce
SciFi-concept
“strength” of the SciFi-concept
=
SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
SVD – Example: Users-to-Movies
• A = U VT - example:
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 7
SciFi-concept
V is “movie-to-concept” similarity matrix
SciFi-concept
=
SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman:
Mining of Massive Datasets, http://www.mmds.org
8
SVD - Interpretation #1
‘movies’, ‘users’ and ‘concepts’: • U: user-to-concept similarity matrix
• V: movie-to-concept similarity matrix
• : its diagonal elements: ‘strength’ of each concept
18/09/2017
3
Case study: How to query? • Q: Find users that like ‘Matrix’
• A: Map query into a ‘concept space’ – how?
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 9
=
SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
Case study: How to query? • Q: Find users that like ‘Matrix’
• A: Map query into a ‘concept space’ – how?
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 10
5 0 0 0 0
q =
Matrix
Alie
n
v1
q
v2
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
Project into concept space:
Inner product with each
‘concept’ vector vi
Case study: How to query? • Q: Find users that like ‘Matrix’
• A: Map query into a ‘concept space’ – how?
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 11
v1
q
q*v1
5 0 0 0 0
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
v2
Matrix
Alie
n
q =
Project into concept space:
Inner product with each
‘concept’ vector vi
Case study: How to query? Compactly, we have:
qconcept = q V
E.g.:
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 12
movie-to-concept similarities (V)
=
SciFi-concept
5 0 0 0 0
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
q =
0.56 0.12
0.59 -0.02
0.56 0.12
0.09 -0.69
0.09 -0.69
x 2.8 0.6
18/09/2017
4
Case study: How to query? • How would the user d that rated
(‘Alien’, ‘Serenity’) be handled? dconcept = d V
E.g.:
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 13
movie-to-concept similarities (V)
=
SciFi-concept
0 4 5 0 0
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
q =
0.56 0.12
0.59 -0.02
0.56 0.12
0.09 -0.69
0.09 -0.69
x 5.2 0.4
Case study: How to query?
• Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to user q that rated (‘Matrix’), although d and q have zero ratings in common!
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 14
0 4 5 0 0
d =
SciFi-concept
5 0 0 0 0
q =
Mat
rix
Alie
n
Sere
nit
y
Cas
abla
nca
Am
elie
Zero ratings in common Similarity ≠ 0
2.8 0.6
5.2 0.4
Centralizando os Dados
• Em algumas aplicações é interessante centralizar os dados antes de aplicar a decomposição SVD
– A centralização consiste em subtrair cada linha da matriz A da média das linhas da matriz (centroide dos pontos respresentados pelas linhas de A)
Centralizando os dados
• Ao aplicar o SVD sem centralizar os dados obtermos o subespaço (hiperplano que passa pela origem) minimiza a soma dos quadrados das distâncias perpendiculares aos dados da matriz A
• Ao aplicar o SVD após centralizar os dados obtermos o espaço afim (affine space) que minimiza a soma dos quadrados das distâncias perpendiculares aos dados da matriz A
18/09/2017
5
Centralizando os Dados
Compressão de Dados
Cenário
• 130 imagens de ‘3’ escritos por mãos
– cada uma representada em uma matriz 16x16.
– cada representação precisa de 16x16 floats => imagens em R256
Compressão de Dados Compressão de Dados
• Modelo com duas direções
• Ao todo 16x16=256 direções possíveis
• 12 direções principais respondem por 63% da variância e 50 por 90% delas
18/09/2017
6
Compressão de Dados
Pontos com círculo em volta no gráfico a esquerda
Compressão de Dados
• Na figura anterior temos a coleção de dígitos 3 projetadas no espaço das duas primeiras direções principais
Alguma interpretação para as componentes?
Compressão de Dados
• Na figura anterior temos a coleção de dígitos 3 projetadas no espaço das duas primeiras direções principais
– Primeira componente (movimento horizontal da parte inferior do 3)
– Segunda componente (espessura)
Compressão de Dados
18/09/2017
7
Entrada. Matriz A representando coleção com n documentos a partir de um vocabulário de d termos
Saida. Dado um conjunto de queries (novos documentos) q(1),...,q(p), determinar eficientemente a similaridade (produto interno) entre cada documento e cada query.
Similaridade Aproximada
Abordagem 1
• Vetor de similaridades entre A e query q é dado por Aq.
• Pode ser computado em O(nd)
Similaridade Aproximada
Abordagem 2
• Utilizar Ak em vez de A
– Temos Ak = 1u1v1T + …+ kukvk
T , onde ui é um
vetor em Rn e vi é um vetor em Rd
– Logo, iuiviTq pode ser calculado em O(n+d)
• calculamos viT q em O(d) e depois i ui (vi
T q) em O(n)
• Complexidade total: O(k(n+d)).
– Vantajoso se k<<min{n,d}
Similaridade Aproximada
Abordagem 2
• Qual o erro de usar Ak?
– Máximo em q de |Aq- Akq |= |(A- Ak)q|
– O erro pode ser Ilimitado se q não tiver nenhuma restrição
Similaridade Aproximada
18/09/2017
8
Def. A norma espectral ||A||2 de A é igual a
max{ |Ax|:|x| 1}
• ||A- Ak ||2 mede o maior erro possível para vetores de norma menor ou igual a 1
• Possível mostrar (Teo 3.9) que
||A- Ak ||2 ||A-B||2
para toda matriz B de rank no máximo k
Similaridade Aproximada
• Coleção com n documentos a partir de um vocabulário de d termos
• Podemos representar a coleção utilizando uma matriz A de n linhas e d colunas
Como rankear os documentos da coleção de acordo com sua importância intrínseca?
Ranking de documentos
Ranking
1. Calcule a direção principal da coleção v1 (melhor representação unidimensional da coleção conforme norma de Frobenius)
2. Projete cada documento na direção v1 e ordene do maior para o menor (coordenadas de u1 )
Ranking de documentos
HITS: Hubs and Authorities
18/09/2017
9
Hubs and Authorities • HITS (Hypertext-Induced Topic Selection)
– Is a measure of importance of pages or documents, similar to PageRank
– Proposed at around same time as PageRank (‘98)
• Goal: Say we want to find good newspapers
– Don’t just find newspapers. Find “experts” – people who link in a coordinated way to good newspapers
• Idea: Links as votes
– Page is more important if it has more links
• In-coming links? Out-going links?
38
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
Finding newspapers
• Hubs and Authorities Each page has 2 scores:
– Quality as an expert (hub):
• Total sum of votes of authorities pointed to
– Quality as a content (authority):
• Total sum of votes coming from experts
• Principle of repeated improvement
39
NYT: 10
Ebay: 3
Yahoo: 3
CNN: 8
WSJ: 9
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
Hubs and Authorities
Interesting pages fall into two classes: 1. Authorities are pages containing
useful information – Newspaper home pages – Course home pages – Home pages of auto manufacturers
2. Hubs are pages that link to authorities – List of newspapers – Course bulletin – List of US auto manufacturers
40
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
Counting in-links: Authority
41
(Note this is idealized example. In reality graph is not bipartite and
each page has both the hub and authority score)
Each page starts with hub
score 1. Authorities collect
their votes
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
18/09/2017
10
Counting in-links: Authority
42
(Note this is idealized example. In reality graph is not bipartite and
each page has both the hub and authority score)
Sum of hub
scores of nodes
pointing to NYT.
Each page starts with hub
score 1. Authorities collect
their votes
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
Expert Quality: Hub
43
Hubs collect authority scores
(Note this is idealized example. In reality graph is not bipartite and
each page has both the hub and authority score)
Sum of authority
scores of nodes that
the node points to.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
Reweighting
44
Authorities again collect
the hub scores
(Note this is idealized example. In reality graph is not bipartite and
each page has both the hub and authority score) J. Leskovec, A. Rajaraman, J. Ullman:
Mining of Massive Datasets, http://www.mmds.org
Mutually Recursive Definition • A good hub links to many good authorities
• A good authority is linked from many good hubs
• Model using two scores for each node:
– Hub score and Authority score
– Represented as vectors 𝒉 and 𝒂
45 J. Leskovec, A. Rajaraman, J. Ullman:
Mining of Massive Datasets, http://www.mmds.org
18/09/2017
11
Hubs and Authorities • Each page 𝒊 has 2 scores:
– Authority score: 𝒂𝒊
– Hub score: 𝒉𝒊
HITS algorithm:
• Initialize: 𝑎𝑗(0)= 1/ N, hj
(0)= 1/ N
• Then keep iterating until convergence:
– ∀𝒊: Authority: 𝑎𝑖(𝑡+1)= ℎ𝑗
(𝑡)𝒋→𝒊
– ∀𝒊: Hub: ℎ𝑖(𝑡+1)= 𝑎𝑗
(𝑡)𝒊→𝒋
– ∀𝒊: Normalize:
𝑎𝑖𝑡+1
2
𝑖 = 1, ℎ𝑗𝑡+1
2
𝑗 = 1
[Kleinberg ‘98]
46
i
j1 j2 j3 j4
𝒂𝒊 = 𝒉𝒋𝒋→𝒊
j1 j2 j3 j4
𝒉𝒊 = 𝒂𝒋𝒊→𝒋
i
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
Hubs and Authorities • HITS converges to a single stable point
• Notation:
– Vector 𝒂 = (𝑎1… , 𝑎𝑛), 𝒉 = (ℎ1… , ℎ𝑛)
– Adjacency matrix 𝑨 (NxN): 𝑨𝒊𝒋 = 1 if 𝑖𝑗, 0 otherwise
• Then 𝒉𝒊 = 𝒂𝒋𝒊→𝒋
can be rewritten as 𝒉𝒊 = 𝑨𝒊𝒋 ⋅ 𝒂𝒋𝒋
So: 𝒉 = 𝑨 ⋅ 𝒂
• Similarly, 𝒂𝒊 = 𝒉𝒋𝒋→𝒊
can be rewritten as 𝒂𝒊 = 𝑨𝒋𝒊 ⋅ 𝒉𝒋𝒋 = 𝑨𝑻 ⋅ 𝒉 47
[Kleinberg ‘98]
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
Hubs and Authorities • HITS algorithm in vector notation:
– Set: 𝒂𝒊 = 𝒉𝒊 =𝟏
𝒏
Repeat until convergence:
– 𝒉 = 𝑨 ⋅ 𝒂
– 𝒂 = 𝑨𝑻 ⋅ 𝒉
– Normalize 𝒂 and 𝒉
• Then: 𝒂 = 𝑨𝑻 ⋅ (𝑨 ⋅ 𝒂)
new 𝒉
new 𝒂
𝒂 is updated (in 2 steps): 𝑎 = 𝐴𝑇(𝐴 𝑎) = (𝐴𝑇𝐴) 𝑎 h is updated (in 2 steps): ℎ = 𝐴
(𝐴𝑇ℎ) = (𝐴 𝐴𝑇) ℎ
Repeated matrix powering 48
ℎ𝑖𝑡 − ℎ𝑖
𝑡−12
𝑖
< 𝜀
𝑎𝑖𝑡 − 𝑎𝑖
𝑡−12
𝑖
< 𝜀
Convergence criterion:
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
Properties
• Under reasonable assumptions about A, HITS converges to vectors h* and a*:
– a* is the first right singular vector of matrix A AT
– h* is the projection of matrix A AT onto a*
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 49
18/09/2017
12
Example of HITS
1 1 1
A = 1 0 1
0 1 0
1 1 0
AT = 1 0 1
1 1 0
h(yahoo)
h(amazon)
h(m’soft)
=
=
=
.58
.58
.58
.80
.53
.27
.80
.53
.27
.79
.57
.23
. . .
. . .
. . .
.788
.577
.211
a(yahoo) = .58
a(amazon) = .58
a(m’soft) = .58
.58
.58
.58
.62
.49
.62
. . .
. . .
. . .
.628
.459
.628
.62
.49
.62
50
Yahoo
M’soft Amazon
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
?????
Hubs que apontam para amazon não são bons!!!
Definição. Um autovetor de uma matriz quadrada B é um vetor v que satisfaz
Bv=v
para um escalar , denominado autovalor
• Como uma matriz B pode ser vista como uma transformação linear então um autovetor é um subespaço de dimensão 1 invariante para B
• Autovetores aparecem uma série de aplicações importantes (e.g. PageRank, HITS)
SVD e Auto Vetores
Observação. Se B=ATA então
(i) os vetores singulares a direita de A, v1,..,v k
são os autovetores de B com autovalores 12
,...,k2
(ii) O vetores singulares a esquerda de A, u1,..,u k
são os autovetores de BT=AAT com autovalores 1
2 ,...,k2
(iii) A matriz B é semi-definida positiva, ou seja, yBy>= 0 para todo y em Rn
SVD e Auto Vetores SVD: Limitações + Aproximação ótima com dimensão reduzida
em termos da norma de Frobenius - Interpretação
– Uma direção específica é combinação linear de todas as linhas da matriz
- Falta de esparsidade – Vetores singulares podem ser densos mesmo quando
a matriz é esparsa!
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 61
=
U
VT
18/09/2017
13
CUR Decomposition
CUR Decomposition
• Goal: Express A as a product of matrices C,U,R
Make ǁA-C·U·RǁF small
• “Constraints” on C and R:
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 63
A C U R
Frobenius norm:
ǁXǁF = Σij Xij2
CUR Decomposition • Goal: Express A as a product of matrices C,U,R
Make ǁA-C·U·RǁF small
• “Constraints” on C and R:
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 64
A C U R
Frobenius norm:
ǁXǁF = Σij Xij2
CUR: How it Works
• Sampling columns (similarly for rows):
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 65
Note this is a randomized algorithm, same
column can be sampled more than once
18/09/2017
14
Computing U (DKM 2006) Notation
• r(i): i-th sampled row of A;
• c(j): j-th sampled column of A
• Q[j]: probability of choosing the j-th row of A
• P[j]: probability of choosing the j-th column of A
• r: number of rows and
• c: number of columns
Define W as the matrix of r lines and c columns where
𝑊𝑖,𝑗 =𝐴𝑟 𝑖 ,𝑐(𝑗)
𝑟𝑐 𝑄[𝑟 𝑖 ]𝑃 𝑐 𝑗
Computing U (DKM 2006) 1. Let CTC =1
2 (v1 v1T
)+...+ k2 (vk vk
T) be the SVD decomposition of CTC
2. Define 𝑊′ =
𝑣𝑖𝑣𝑇𝑖
12
𝑘𝑖=1
3. Define
U= W’ W T
CUR: Provably good approx. to SVD
• For example:
– Select 𝒄 = 𝑶𝒌 𝒍𝒐𝒈 𝒌
𝜺𝟐 columns of A using
ColumnSelect algorithm
– Select 𝒓 = 𝑶𝒌 𝒍𝒐𝒈 𝒌
𝜺𝟐 rows of A using
ColumnSelect algorithm
– Set 𝑼 = 𝑾+
• Then:
with probability 98%
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 73
In practice:
Pick 4k cols/rows
for a “rank-k” approximation
SVD error CUR error
CUR: Pros & Cons
+ Easy interpretation • Since the basis vectors are actual
columns and rows
+ Sparse basis • Since the basis vectors are actual
columns and rows
- Duplicate columns and rows • Columns of large norms will be sampled many
times
J. Leskovec, A. Rajaraman, J. Ullman:
Mining of Massive Datasets, http://www.mmds.org
74
Singular vector Actual column
18/09/2017
15
Solution
• If we want to get rid of the duplicates:
– Throw them away
– Scale (multiply) the columns/rows by the square root of the number of duplicates
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 75
A Cd
Rd
Cs
Rs
Construct a small U
SVD vs. CUR
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 76
SVD: A = U VT
Huge but sparse Big and dense
CUR: A = C U R
Huge but sparse Big but sparse
dense but small
sparse and small
SVD vs. CUR: Simple Experiment
• DBLP bibliographic data – Author-to-conference big sparse matrix
– Aij: Number of papers published by author i at conference j
– 428K authors (rows), 3659 conferences (columns) • Very sparse
• Want to reduce dimensionality – How much time does it take?
– What is the reconstruction error?
– How much space do we need?
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org 77
Results: DBLP- big sparse matrix
• Accuracy: – 1 – relative sum squared errors
• Space ratio: – #output matrix entries / #input matrix entries
• CPU time J. Leskovec, A. Rajaraman, J. Ullman:
Mining of Massive Datasets, http://www.mmds.org
78
SVD
CUR
CUR no duplicates
SVD
CUR
CUR no dup
Sun, Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM ’07.
CUR
SVD
18/09/2017
16
Bibliografia
• Cap 3. Foundations of Data Science, Avrim Blum, John Hopcroft and Ravindran Kannan http://www.cs.cornell.edu/jeh/book.pdfs
• Cap 10. Mining Massive Datasets • Fast Monte Carlo Algorithms for Matriices III:
Computing a Compressed Approximate Matrix Decomposition (DKM, SICOMP 2006)
• Cap 14 de Elements of Statistical Learning • http://www.cs.cmu.edu/~jimeng/papers/SunSD
M07.pdf