determinantal point processesapparikh/nips2014ml-nlp/slides/dpps-nips2014.pdfdeterminantal point...
TRANSCRIPT
DETERMINANTAL POINT PROCESSESFOR
NATURAL LANGUAGE PROCESSING
Jennifer GillenwaterJoint work with Alex Kulesza and Ben Taskar
OUTLINE
OUTLINE
Motivation & background on DPPs
OUTLINE
Motivation & background on DPPs
Large-scale settings
OUTLINE
Motivation & background on DPPs
Large-scale settings
Structured summarization
OUTLINE
Motivation & background on DPPs
Large-scale settings
Structured summarization
Other potential NLP applications
MOTIVATION &BACKGROUND
SUMMARIZATION
SUMMARIZATION
...
SUMMARIZATION
...
SUMMARIZATION
...
Quality: relevance to
the topic
SUMMARIZATION
...
Quality: relevance to
the topic
Diversity: coverage of core ideas
SUBSET SELECTION
SUBSET SELECTION
SUBSET SELECTION
SUBSET SELECTION
AREA AS SET-GOODNESS
feature space
AREA AS SET-GOODNESS
feature space
Bi
Bj
AREA AS SET-GOODNESS
feature spacequality =
pB>
i Bi
similarity = B>i Bj
Bi
Bj
AREA AS SET-GOODNESS
feature spacequality =
pB>
i Bi
similarity = B>i Bj
Bi +Bj
Bi
Bj
AREA AS SET-GOODNESS
area
=q kB
ik2 2kB
jk2 2�(B
>iBj)2
feature spacequality =
pB>
i Bi
similarity = B>i Bj
Bi +Bj
Bi
Bj
AREA AS SET-GOODNESS
area
=q kB
ik2 2kB
jk2 2�(B
>iBj)2
feature spacequality =
pB>
i Bi
similarity = B>i Bj
Bi +Bj
Bi
Bj
AREA AS SET-GOODNESS
area
=q kB
ik2 2kB
jk2 2�(B
>iBj)2
VOLUME AS SET-GOODNESS
area =q
kBik22kBjk22 � (B>i Bj)2
VOLUME AS SET-GOODNESS
area =q
kBik22kBjk22 � (B>i Bj)2
VOLUME AS SET-GOODNESS
area =q
kBik22kBjk22 � (B>i Bj)2
length = kBik2
VOLUME AS SET-GOODNESS
area =q
kBik22kBjk22 � (B>i Bj)2
length = kBik2 volume = base ⇥ height
VOLUME AS SET-GOODNESS
area =q
kBik22kBjk22 � (B>i Bj)2
= ||B1||2vol(proj?B1(B2:N ))
vol(B) = height⇥ base
length = kBik2 volume = base ⇥ height
AREA AS A DETarea =
qkBik22kBjk22 � (B>
i Bj)2
AREA AS A DETarea =
qkBik22kBjk22 � (B>
i Bj)2
||Bi||22
B>i Bj
B>i Bj( )= det
||Bj ||22
12
AREA AS A DETarea =
qkBik22kBjk22 � (B>
i Bj)2
||Bi||22
B>i Bj
B>i Bj( )= det
||Bj ||22
12
= det( )Bi
Bj Bi
Bj
12
VOLUME AS A DET
= det( )Bi
Bj Bi
Bj
12
vol(B{i,j})
VOLUME AS A DET
= det( )Bi
Bj Bi
Bj
12
vol(B{i,j})
vol(B) = det
12B1
BN
... B1
BN. . .( )
vol(B)
2= det(B>B) = det(L)
COMPLEX STATISTICS
COMPLEX STATISTICS
COMPLEX STATISTICS
COMPLEX STATISTICSP
COMPLEX STATISTICSP
COMPLEX STATISTICSP
COMPLEX STATISTICSP
COMPLEX STATISTICSP
COMPLEX STATISTICSP
COMPLEX STATISTICSP
COMPLEX STATISTICSP
COMPLEX STATISTICS
N items =) 2N sets
EFFICIENT COMPUTATION
det12
EFFICIENT COMPUTATION
EFFICIENT COMPUTATION2
det
EFFICIENT COMPUTATION2
P
det
EFFICIENT COMPUTATION2
det
O(N3)
POINT PROCESSESY = {1, . . . , N}
POINT PROCESSESY = {1, . . . , N}
POINT PROCESSESY = {1, . . . , N}
( )P
POINT PROCESSESY = {1, . . . , N}
( )P = 0.2
DETERMINANTAL
DETERMINANTALP({2, 3, 5}) /
DETERMINANTAL
L11 L12 L13 L14 L15
L21 L22 L23 L24 L25
L35L34L33L32L31
L41 L42 L43 L44 L45
L55L54L53L52L51
P({2, 3, 5}) /
DETERMINANTAL
L11 L12 L13 L14 L15
L21 L22 L23 L24 L25
L35L34L33L32L31
L41 L42 L43 L44 L45
L55L54L53L52L51
P({2, 3, 5}) /
DETERMINANTALL22 L23 L25
L35L33L32
L55L53L52
P({2, 3, 5}) /
DETERMINANTALL22 L23 L25
L35L33L32
L55L53L52
det( )P({2, 3, 5}) /
DETERMINANTALL22 L23 L25
L35L33L32
L55L53L52
det( )P({2, 3, 5}) =
DETERMINANTALL22 L23 L25
L35L33L32
L55L53L52
det( )P({2, 3, 5}) =
det(L+ I)
EFFICIENT INFERENCE
EFFICIENT INFERENCE
PL(Y = Y )Normalizing:
EFFICIENT INFERENCE
PL(Y = Y )
P(Y ✓ Y)
Normalizing:
Marginalizing:
EFFICIENT INFERENCE
PL(Y = Y )
PL(Y = B | A ✓ Y)
PL(Y = B | A \Y = ;)
P(Y ✓ Y)
Normalizing:
Marginalizing:
Conditioning:
EFFICIENT INFERENCE
PL(Y = Y )
PL(Y = B | A ✓ Y)
PL(Y = B | A \Y = ;)
Y ⇠ PL
P(Y ✓ Y)
Normalizing:
Marginalizing:
Conditioning:
Sampling:
EFFICIENT INFERENCE
PL(Y = Y )
PL(Y = B | A ✓ Y)
PL(Y = B | A \Y = ;)
Y ⇠ PL
P(Y ✓ Y)
O(N3)
Normalizing:
Marginalizing:
Conditioning:
Sampling:
LARGE-SCALE SETTINGS
DUAL KERNELKULESZA AND TASKAR (NIPS 2010)
DUAL KERNEL
B1
BN
...
B2
B3
B1
BN. . .B2
B3
LKULESZA AND TASKAR (NIPS 2010)
DUAL KERNEL
B1
BN
...
B2
B3
B1
BN. . .B2
B3
L
N ⇥N
=
KULESZA AND TASKAR (NIPS 2010)
DUAL KERNEL
B1
BN
...
B2
B3
B1
BN. . .B2
B3
C
N ⇥N
=
KULESZA AND TASKAR (NIPS 2010)
DUAL KERNEL
B1
BN
...
B2
B3
B1
BN. . .B2
B3
C
=
KULESZA AND TASKAR (NIPS 2010)
DUAL KERNEL
B1
BN
...
B2
B3
B1
BN. . .B2
B3
= D ⇥D
CKULESZA AND TASKAR (NIPS 2010)
DUAL INFERENCE
DUAL INFERENCE
L = V ⇤V > C = V̂ ⇤V̂ >
DUAL INFERENCE
L = V ⇤V > C = V̂ ⇤V̂ >V = B>V̂ ⇤� 12
DUAL INFERENCE
L = V ⇤V > C = V̂ ⇤V̂ >V = B>V̂ ⇤� 12
O(D3)P
Y det(LY )Normalizing
DUAL INFERENCE
L = V ⇤V > C = V̂ ⇤V̂ >V = B>V̂ ⇤� 12
O(D3)P
Y det(LY )Normalizing
O(D3 +D2k2)Marginalizing & Conditioning
DUAL INFERENCE
L = V ⇤V > C = V̂ ⇤V̂ >V = B>V̂ ⇤� 12
O(ND2k)Y ⇠ PLSampling
O(D3)P
Y det(LY )Normalizing
O(D3 +D2k2)Marginalizing & Conditioning
EXPONENTIAL N
EXPONENTIAL N
N = O({sentence length}{sentence length})
We want to select a diverse set of parses.
EXPONENTIAL N
N = O({sentence length}{sentence length})
We want to select a diverse set of parses.
N = O({node degree}{path length})
i =
STRUCTURE FACTORIZATIONKULESZA AND TASKAR (NIPS 2010)
i =
STRUCTURE FACTORIZATION
Bi = q(i)�(i)
KULESZA AND TASKAR (NIPS 2010)
quality similarity
i =
STRUCTURE FACTORIZATION
Bi = q(i)�(i)
i = {i↵}↵2F↵
c = 1
KULESZA AND TASKAR (NIPS 2010)
quality similarity
i =
STRUCTURE FACTORIZATION
Bi = q(i)�(i)
i = {i↵}↵2F↵
c = 2
KULESZA AND TASKAR (NIPS 2010)
quality similarity
i =
STRUCTURE FACTORIZATION
i = {i↵}↵2F
Bi =
Q↵2F
q(i↵)
��(i)
↵
c = 2
KULESZA AND TASKAR (NIPS 2010)
i =
STRUCTURE FACTORIZATION
i = {i↵}↵2F
Bi =
Q↵2F
q(i↵)
� P↵2F
�(i↵)
�
↵
c = 2
KULESZA AND TASKAR (NIPS 2010)
i =
STRUCTURE FACTORIZATION
i = {i↵}↵2F
Bi =
Q↵2F
q(i↵)
� P↵2F
�(i↵)
�
↵
c = 2
O(ND2k)Y ⇠ PL
KULESZA AND TASKAR (NIPS 2010)
STRUCTURE FACTORIZATION
Bi =
Q↵2F
q(i↵)
� P↵2F
�(i↵)
�
M = R =
↵
c = 2
Y ⇠ PL O(D2k3 +Dk2M cR)
KULESZA AND TASKAR (NIPS 2010)
STRUCTURE FACTORIZATION
Bi =
Q↵2F
q(i↵)
� P↵2F
�(i↵)
�
M = R =
↵
c = 2
Y ⇠ PL O(D2k3 +Dk2M cR)
KULESZA AND TASKAR (NIPS 2010)
M cR = 42 ⇤ 12 = 192 ⌧ N = 412 = 16,777,216
LARGE FEATURE SETS?
Large Exponential
Small dualdual +
structure
Large ? ?
LARGE FEATURE SETS?N = # of items
D=
#offeatures
Large Exponential
Small dualdual +
structure
Large ? ?
LARGE FEATURE SETS?N = # of items
D=
#offeatures
RANDOM PROJECTIONSGILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
RANDOM PROJECTIONS
N
D
�
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
RANDOM PROJECTIONS
D
�
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
M cR
RANDOM PROJECTIONS
D d
D⇥�
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
M cR
RANDOM PROJECTIONS
D d
D⇥
d
=�
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
M cR M cR
RANDOM PROJECTIONSd
D
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
RANDOM PROJECTIONSd
D
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
VOLUME PRESERVATION
VOLUME PRESERVATIONJOHNSON AND LINDENSTRAUSS (1984)
VOLUME PRESERVATIONJOHNSON AND LINDENSTRAUSS (1984)
logN
VOLUME PRESERVATIONJOHNSON AND LINDENSTRAUSS (1984)
logN
VOLUME PRESERVATIONMAGEN AND ZOUZIAS (2008)
logN
VOLUME PRESERVATIONMAGEN AND ZOUZIAS (2008)
logN
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
DPP PRESERVATIONvol
2= det
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
DPP PRESERVATION
⇡
k = 1
vol
2= det
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
DPP PRESERVATION
⇡
k = 1
⇡
k = 2
vol
2= det
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
DPP PRESERVATION
⇡
k = 1
⇡
k = 2
⇡
k = 3
vol
2= det
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
DPP PRESERVATION
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
DPP PRESERVATION
d = O⇣
max
n
k✏ ,
log(1/�)+log(N)
✏2 + ko⌘
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
DPP PRESERVATION
d = O⇣
max
n
k✏ ,
log(1/�)+log(N)
✏2 + ko⌘
total # of itemssubset size
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
DPP PRESERVATION
d = O⇣
max
n
k✏ ,
log(1/�)+log(N)
✏2 + ko⌘
w.p. 1� � : kPk � P̃kk1 e6k✏ � 1
total # of itemssubset size
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
DPP PRESERVATION
d = O⇣
max
n
k✏ ,
log(1/�)+log(N)
✏2 + ko⌘
0 50 100 1500
0.2
0.4
0.6
0.8
1
1.2
L1 v
aria
tiona
l dis
tanc
e
Projection dimension
0
1
2
3
4x 108
Mem
ory
use
(byt
es)
w.p. 1� � : kPk � P̃kk1 e6k✏ � 1
total # of itemssubset size
STRUCTUREDSUMMARIZATION
NEWS THREADING
NEWS THREADING
NEWS THREADING
March 28: Health officials confirm Ebola outbreak in Guinea’s capital
NEWS THREADING
March 28: Health officials confirm Ebola outbreak in Guinea’s capital
August 8: World Health Organization declares Ebola epidemic an
international health emergency
NEWS THREADING
March 28: Health officials confirm Ebola outbreak in Guinea’s capital
August 8: World Health Organization declares Ebola epidemic an
international health emergency
September 2: GlaxoSmithKlein begins Ebola vaccine drug trial
NEWS THREADING
10360
March 28: Health officials confirm Ebola outbreak in Guinea’s capital
August 8: World Health Organization declares Ebola epidemic an
international health emergency
September 2: GlaxoSmithKlein begins Ebola vaccine drug trial
M ⇡ 35,000
PROJECTING NEWS FEATURES
PROJECTING NEWS FEATURES
�(i)
D = 36,356
PROJECTING NEWS FEATURES
G
G�(i)�(i)
D = 36,356 d = 50
30
31
DPP THREADS
DPP THREADS
Jan 08 Jan 28 Feb 17 Mar 09 Mar 29 Apr 18 May 08 May 28 Jun 17
pope vatican church parkinson
israel palestinian iraqi israeli gaza abbas baghdad
owen nominees senate democrats judicial filibusters
social tax security democrats rove accounts
iraq iraqi killed baghdad arab marines deaths forces
DPP THREADS
Jan 08 Jan 28 Feb 17 Mar 09 Mar 29 Apr 18 May 08 May 28 Jun 17
pope vatican church parkinson
israel palestinian iraqi israeli gaza abbas baghdad
owen nominees senate democrats judicial filibusters
social tax security democrats rove accounts
iraq iraqi killed baghdad arab marines deaths forces
Feb 24: Parkinson's Disease Increases Risks to PopeFeb 26: Pope's Health Raises Questions About His Ability to LeadMar 13: Pope Returns Home After 18 Days at HospitalApr 01: Pope's Condition Worsens as World Prepares for End of PapacyApr 02: Pope, Though Gravely Ill, Utters Thanks for PrayersApr 18: Europeans Fast Falling Away from ChurchApr 20: In Developing World, Choice [of Pope] Met with SkepticismMay 18: Pope Sends Message with Choice of Name
System k-means DTM DPP
ROUGE-1F 16.5 14.7 17.2
R-SU4F 3.76 3.44 3.98
Coherence 2.73 3.2 3.3
QUANTITATIVE RESULTS
System k-means DTM DPP
ROUGE-1F 16.5 14.7 17.2
R-SU4F 3.76 3.44 3.98
Coherence 2.73 3.2 3.3
QUANTITATIVE RESULTS
System k-means DTM DPP
ROUGE-1F 16.5 14.7 17.2
R-SU4F 3.76 3.44 3.98
Coherence 2.73 3.2 3.3
QUANTITATIVE RESULTS
System k-means DTM DPP
ROUGE-1F 16.5 14.7 17.2
R-SU4F 3.76 3.44 3.98
Coherence 2.73 3.2 3.3
QUANTITATIVE RESULTS
System k-means DTM DPP
ROUGE-1F 16.5 14.7 17.2
R-SU4F 3.76 3.44 3.98
Coherence 2.73 3.2 3.3
Runtime (s) 626 19,434 252
QUANTITATIVE RESULTS
OTHER POTENTIALNLP APPLICATIONS
POTENTIAL APP: RE-RANKING
• Parser: simple model with local features defines basic scores for all possible parse trees
POTENTIAL APP: RE-RANKING
• Parser: simple model with local features defines basic scores for all possible parse trees
• Re-ranker: more complex model with non-local features provides more refined scores
POTENTIAL APP: RE-RANKING
• Parser: simple model with local features defines basic scores for all possible parse trees
• Re-ranker: more complex model with non-local features provides more refined scores
• Typical pipeline: find the k highest-scoring parses under the simple model, then score these k with the more complex model and output the best
POTENTIAL APP: RE-RANKING
• Parser: simple model with local features defines basic scores for all possible parse trees
• Re-ranker: more complex model with non-local features provides more refined scores
• Typical pipeline: find the k highest-scoring parses under the simple model, then score these k with the more complex model and output the best
• Issue: the k may be largely redundant, so re-ranker does not get to consider significantly different parses
POTENTIAL APP: RE-RANKING
IDEA: USE DPPS FORSELECTING RE-RANKER INPUT
IDEA: USE DPPS FORSELECTING RE-RANKER INPUT
N = O({sentence length}{sentence length})
We want to select a diverse set of parses.
IDEA: USE DPPS FORSELECTING RE-RANKER INPUT
N = O({sentence length}{sentence length})
We want to select a diverse set of parses.
Quality: standard
parser scores
IDEA: USE DPPS FORSELECTING RE-RANKER INPUT
N = O({sentence length}{sentence length})
We want to select a diverse set of parses.
Quality: standard
parser scores
Diversity: edge lengths,
POS pairs, etc.
POTENTIAL APP:WORD SENSE INDUCTION
POTENTIAL APP:WORD SENSE INDUCTION• Goal: identify all possible senses of ambiguous
words (e.g. river bank vs bank deposit)
POTENTIAL APP:WORD SENSE INDUCTION• Goal: identify all possible senses of ambiguous
words (e.g. river bank vs bank deposit)
• Typical approach: unsupervised clustering, cluster centers represent word senses
POTENTIAL APP:WORD SENSE INDUCTION• Goal: identify all possible senses of ambiguous
words (e.g. river bank vs bank deposit)
• Typical approach: unsupervised clustering, cluster centers represent word senses
• Why DPPs fit: can re-express finding cluster centers as the problem of finding a high-quality, diverse set
POTENTIAL APP:WORD SENSE INDUCTION• Goal: identify all possible senses of ambiguous
words (e.g. river bank vs bank deposit)
• Typical approach: unsupervised clustering, cluster centers represent word senses
• Why DPPs fit: can re-express finding cluster centers as the problem of finding a high-quality, diverse set
Quality: centrality (density of points nearby)
POTENTIAL APP:WORD SENSE INDUCTION• Goal: identify all possible senses of ambiguous
words (e.g. river bank vs bank deposit)
• Typical approach: unsupervised clustering, cluster centers represent word senses
• Why DPPs fit: can re-express finding cluster centers as the problem of finding a high-quality, diverse set
Quality: centrality (density of points nearby)
Diversity: same as standard
WSI features
QUESTIONS?