topics in statistical language modeling tom griffiths
TRANSCRIPT
Topics in statistical language modeling
Tom Griffiths
Mark SteyversUC Irvine
Josh TenenbaumMIT
Dave BleiCMU
Mike JordanUC Berkeley
Latent Dirichlet Allocation (LDA)
• Each document a mixture of topics
• Each word chosen from a single topic
• Introduced by Blei, Ng, and Jordan (2001), reinterpretation of PLSI (Hofmann, 1999)
• Idea of probabilistic topics widely used (eg. Bigi et al., 1997; Iyer & Ostendorf, 1996; Ueda & Saito, 2003)
• Each document a mixture of topics
• Each word chosen from a single topic
• from parameters
• from parameters
Latent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA)
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2SCIENTIFIC 0.0KNOWLEDGE 0.0WORK 0.0RESEARCH 0.0MATHEMATICS 0.0
HEART 0.0 LOVE 0.0SOUL 0.0TEARS 0.0JOY 0.0 SCIENTIFIC 0.2KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
topic 1 topic 2
w P(w|z = 1) = (1) w P(w|z = 2) = (2)
Choose mixture weights for each document, generate “bag of words”
= {P(z = 1), P(z = 2)}
{0, 1}
{0.25, 0.75}
{0.5, 0.5}
{0.75, 0.25}
{1, 0}
MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK
SCIENTIFIC KNOWLEDGE MATHEMATICS SCIENTIFIC HEART LOVE TEARS KNOWLEDGE HEART
MATHEMATICS HEART RESEARCH LOVE MATHEMATICS WORK TEARS SOUL KNOWLEDGE HEART
WORK JOY SOUL TEARS MATHEMATICS TEARS LOVE LOVE LOVE SOUL
TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY
Generating a document
1. Choose d Dirichlet ()
2. For each word in the document– choose z Multinomial ((d))– choose w Multinomial ((z))
z
w
zz
w w
Inverting the generative model
• Generative model gives procedure to obtain corpus from topics, mixing proportions
• Inverting the model extracts topics and mixing proportions from corpus
• Goal: describe content of documents, and be able to identify content of new documents
• All inference completely unsupervised, fixed # of topics T, words W, documents D
Inverting the generative model
• Maximum likelihood estimation (EM)– eg. Hofmann (1999)– slow, local maxima
• Approximate E-steps – VB; Blei, Ng & Jordan (2001)– EP; Minka & Lafferty (2002)
• Bayesian inference(via Gibbs sampling)
Gibbs sampling in LDA
• Numerator rewards sparsity in words assigned to topics, topics to documents
• Sum in the denominator over Tn terms
• Full posterior tractable to a constant, so use Markov chain Monte Carlo (MCMC)
Markov chain Monte Carlo
• Sample from a Markov chain constructed to converge to the target distribution
• Allows sampling from unnormalized posterior, and other complex distributions
• Can compute approximate statistics from intractable distributions
• Gibbs sampling one such method, construct Markov chain with conditional distributions
Gibbs sampling in LDA
• Need full conditional distributions for variables
• Since we only sample z we need
number of times word w assigned to topic j
number of times topic j used in document d
Gibbs sampling in LDA
i wi di zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
iteration1
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
2?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
21?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
211?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
2112?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
211222212212...1
…
222122212222...1
iteration1 2 … 1000
Estimating topic distributions
Parameter estimates from posterior predictive distributions
pixel = word image = document
sample each pixel froma mixture of topics
A visual example: Bars
Strategy
• Markov chain Monte Carlo (MCMC) is normally slow, so why consider using it?
• In discrete models, use conjugate priors to reduce inference to discrete variables
• Several benefits:– save memory: need only track sparse counts– save time: cheap updates, even with complex
dependencies between variables
(not estimating Dirichlet hyperparameters , )
Perplexity vs. time
Strategy
• Markov chain Monte Carlo (MCMC) is normally slow, so why consider using it?
• In discrete models, use conjugate priors to reduce inference to discrete variables
• Several benefits:– save memory: need only track sparse counts– save time: cheap updates, even with complex
dependencies between variablesThese properties let us explore larger, more complex models
Application to corpus data
• TASA corpus: text from first grade to college
• 26414 word types, over 37000 documents, used approximately 6 million word tokens
• Run Gibbs for models with T = 300, 500, …, 1700 topics
THEORYSCIENTISTS
EXPERIMENTOBSERVATIONS
SCIENTIFICEXPERIMENTSHYPOTHESIS
EXPLAINSCIENTISTOBSERVED
EXPLANATIONBASED
OBSERVATIONIDEA
EVIDENCETHEORIESBELIEVED
DISCOVEREDOBSERVE
FACTS
SPACEEARTHMOON
PLANETROCKET
MARSORBIT
ASTRONAUTSFIRST
SPACECRAFTJUPITER
SATELLITESATELLITES
ATMOSPHERESPACESHIPSURFACE
SCIENTISTSASTRONAUT
SATURNMILES
ARTPAINT
ARTISTPAINTINGPAINTEDARTISTSMUSEUM
WORKPAINTINGS
STYLEPICTURES
WORKSOWN
SCULPTUREPAINTER
ARTSBEAUTIFUL
DESIGNSPORTRAITPAINTERS
STUDENTSTEACHERSTUDENT
TEACHERSTEACHING
CLASSCLASSROOM
SCHOOLLEARNING
PUPILSCONTENT
INSTRUCTIONTAUGHTGROUPGRADE
SHOULDGRADESCLASSES
PUPILGIVEN
BRAINNERVESENSE
SENSESARE
NERVOUSNERVES
BODYSMELLTASTETOUCH
MESSAGESIMPULSES
CORDORGANSSPINALFIBERS
SENSORYPAIN
IS
CURRENTELECTRICITY
ELECTRICCIRCUIT
ISELECTRICAL
VOLTAGEFLOW
BATTERYWIRE
WIRESSWITCH
CONNECTEDELECTRONSRESISTANCE
POWERCONDUCTORS
CIRCUITSTUBE
NEGATIVE
A selection from 500 topics [P(w|z = j)]
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAM
DREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
FIELDMAGNETICMAGNET
WIRENEEDLE
CURRENTCOIL
POLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
A selection from 500 topics [P(w|z = j)]
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAM
DREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
FIELDMAGNETICMAGNET
WIRENEEDLE
CURRENTCOIL
POLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
A selection from 500 topics [P(w|z = j)]
PLANETCue:
(Nelson, McEvoy & Schreiber, 1998)
Evaluation: Word association
EARTHPLUTO
JUPITERNEPTUNE
VENUSURANUSSATURNCOMETMARS
ASTEROID
PLANETCue:
Associates:
(Nelson, McEvoy & Schreiber, 1998)
Evaluation: Word association
associates
cues
Evaluation: Word association
• Comparison with Latent Semantic Analysis (LSA; Landauer & Dumais, 1997)
• Both algorithms applied to TASA corpus (D > 30,000, W > 20,000, n > 6,000,000)
• Compare LSA cosine, inner product, with the “on-topic” conditional probability
Evaluation: Word association
Latent Semantic Analysis(Landauer & Dumais, 1997)
1
…
6
…
11
…
spaces
…
6195semantic
2120in
3034words
Doc3 … Doc2Doc1
SVD words
in
semantic
spaces
X U D V T
co-occurrence matrix high dimensional space
wor
ds
documents
U D V
wor
ds
dims
dims
dim
s
vect
ors documents
LSA
Latent Semantic Analysis(Landauer & Dumais, 1997)
Dimensionality reduction makes storage efficient, extracts correlation
100
101
102
103
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
P(set contains first associate)
Set size
LSATOPICS
Pro
babi
lity
of
cont
anin
g fi
rst a
ssoc
iate
Rank
Problems
• Finding the right number of topics
• No dependencies between topics
• The “bag of words” assumption
• Need for a stop list
Problems
• Finding the right number of topics
• No dependencies between topics
• The “bag of words” assumption
• Need for a stop list
}
}
CRP models(Blei, Jordan,Tenenbaum)
HMM syntax(Steyvers,
BleiTenenbaum)
Problems
• Finding the right number of topics
• No dependencies between topics
• The “bag of words” assumption
• Need for a stop list
}
}
CRP models(Blei, Jordan,Tenenbaum)
HMM syntax(Steyvers,
BleiTenenbaum)
T corpus topics
all T topics are ineach document
1 T
Standard LDA:
doc1
doc2
doc3
T corpus topics
only L topics are ineach document
1 T
doc1
doc2
doc3
T corpus topics
only L topics are ineach document
1 T
doc1
doc2
doc3
topic identities indexed by c
Richer dependencies
• Nature of topic dependencies comes from prior on assignments to documents p(c)
• Inference with Gibbs is straightforward
• Boring prior: pick L from T uniformly
• Some interesting priors on assignments:– Chinese restaurant process (CRP)– nested CRP (for hierarchies)
Chinese restaurant process
• The mth customer at an infinitely large Chinese restaurant chooses a table with
• Also Dirichlet process, infinite models (Beal, Ghahramani, Neal, Rasmussen)
• Prior on assignments: one topic on each table, L visits/document, T is unbounded
Generating a document
1. Choose c by sampling L tables from the Chinese restaurant, without replacement
2. Choose d Dirichlet () (over L slots)
3. For each word in the document– choose z Multinomial ((d))– choose w Multinomial ((c(z)))
Inverting the generative model
• Draw z as before, but conditioned on c
• Draw c one at a time from
• Need only track occupied tables
• Recover topics, number of occupied tables
Chinese restaurant process prior Bayes factor
Model selection with the CRP
Nested CRP
• Infinitely many infinite-table restaurants
• Every table has a card for another restaurant, forming an infinite-branching tree
• L day vacation: visit root restaurant first night, go to restaurant on card the next night, etc.
• Once inside the restaurant, choose the table (and the next restaurant) via the standard CRP
The nested CRP as a prior
• One topic per restaurant, each document has one topic at each of the L-levels of a tree
• Each c is a path through the tree
• Collecting these paths from all documents gives a finite subtree of used topics
• Allows unsupervised learning of hierarchies
• Extends Hofmann’s (1999) topic hierarchies
Generating a document
1. Choose c by sampling a path from the nested Chinese restaurant process
2. Choose d Dirichlet () (over L slots)
3. For each word in the document– choose z Multinomial ((d))– choose w Multinomial ((c(z)))
Inverting the generative model
• Draw z as before, but conditioned on c
• Draw c as a block from
• Need only track previously taken paths
• Recover topics, set of paths (finite subtree)
Twelve years of NIPS
Summary
• Letting document topics to be a subset of corpus topics allows richer dependencies
• Using Gibbs sampling makes it possible to have an unbounded number of corpus topics
• Flat model, hierarchies only two options of many: factorial, arbitrary graphs, etc
Problems
• Finding the right number of topics
• No dependencies between topics
• The “bag of words” assumption
• Need for a stop list
}
}
CRP models(Blei, Jordan,Tenenbaum)
HMM syntax(Steyvers,
Tenenbaum)
Syntax and semantics from statistics
z
w
zz
w w
xxx
semantics: probabilistic topics
syntax: probabilistic regular grammar
Factorization of language based onstatistical dependency patterns:
long-range, document specific,dependencies
short-range dependencies constantacross all documents
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
THE ………………………………
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
THE LOVE……………………
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
THE LOVE OF………………
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
THE LOVE OF RESEARCH ……
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
Inverting the generative model
• Sample z conditioned on x, other z– draw from prior if x > 1
• Sample x conditioned on z, other x
• Inference allows estimation of– “semantic” topics– “syntactic” classes
FOODFOODSBODY
NUTRIENTSDIETFAT
SUGARENERGY
MILKEATINGFRUITS
VEGETABLESWEIGHT
FATSNEEDS
CARBOHYDRATESVITAMINSCALORIESPROTEIN
MINERALS
MAPNORTHEARTHSOUTHPOLEMAPS
EQUATORWESTLINESEAST
AUSTRALIAGLOBEPOLES
HEMISPHERELATITUDE
PLACESLAND
WORLDCOMPASS
CONTINENTS
DOCTORPATIENTHEALTH
HOSPITALMEDICAL
CAREPATIENTS
NURSEDOCTORSMEDICINENURSING
TREATMENTNURSES
PHYSICIANHOSPITALS
DRSICK
ASSISTANTEMERGENCY
PRACTICE
BOOKBOOKS
READINGINFORMATION
LIBRARYREPORT
PAGETITLE
SUBJECTPAGESGUIDE
WORDSMATERIALARTICLE
ARTICLESWORDFACTS
AUTHORREFERENCE
NOTE
GOLDIRON
SILVERCOPPERMETAL
METALSSTEELCLAYLEADADAM
OREALUMINUM
MINERALMINE
STONEMINERALS
POTMININGMINERS
TIN
BEHAVIORSELF
INDIVIDUALPERSONALITY
RESPONSESOCIAL
EMOTIONALLEARNINGFEELINGS
PSYCHOLOGISTSINDIVIDUALS
PSYCHOLOGICALEXPERIENCES
ENVIRONMENTHUMAN
RESPONSESBEHAVIORSATTITUDES
PSYCHOLOGYPERSON
CELLSCELL
ORGANISMSALGAE
BACTERIAMICROSCOPEMEMBRANEORGANISM
FOODLIVINGFUNGIMOLD
MATERIALSNUCLEUSCELLED
STRUCTURESMATERIAL
STRUCTUREGREENMOLDS
Semantic topics
PLANTSPLANT
LEAVESSEEDSSOIL
ROOTSFLOWERS
WATERFOOD
GREENSEED
STEMSFLOWER
STEMLEAF
ANIMALSROOT
POLLENGROWING
GROW
GOODSMALL
NEWIMPORTANT
GREATLITTLELARGE
*BIG
LONGHIGH
DIFFERENTSPECIAL
OLDSTRONGYOUNG
COMMONWHITESINGLE
CERTAIN
THEHIS
THEIRYOURHERITSMYOURTHIS
THESEA
ANTHATNEW
THOSEEACH
MRANYMRSALL
MORESUCHLESS
MUCHKNOWN
JUSTBETTERRATHER
GREATERHIGHERLARGERLONGERFASTER
EXACTLYSMALLER
SOMETHINGBIGGERFEWERLOWER
ALMOST
ONAT
INTOFROMWITH
THROUGHOVER
AROUNDAGAINSTACROSS
UPONTOWARDUNDERALONGNEAR
BEHINDOFF
ABOVEDOWN
BEFORE
SAIDASKED
THOUGHTTOLDSAYS
MEANSCALLEDCRIED
SHOWSANSWERED
TELLSREPLIED
SHOUTEDEXPLAINEDLAUGHED
MEANTWROTE
SHOWEDBELIEVED
WHISPERED
ONESOMEMANYTWOEACHALL
MOSTANY
THREETHIS
EVERYSEVERAL
FOURFIVEBOTHTENSIX
MUCHTWENTY
EIGHT
HEYOU
THEYI
SHEWEIT
PEOPLEEVERYONE
OTHERSSCIENTISTSSOMEONE
WHONOBODY
ONESOMETHING
ANYONEEVERYBODY
SOMETHEN
Syntactic classes
BEMAKE
GETHAVE
GOTAKE
DOFINDUSESEE
HELPKEEPGIVELOOKCOMEWORKMOVELIVEEAT
BECOME
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Bayes factors for different models
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Part-of-speech tagging
MODELALGORITHM
SYSTEMCASE
PROBLEMNETWORKMETHOD
APPROACHPAPER
PROCESS
ISWASHAS
BECOMESDENOTES
BEINGREMAINS
REPRESENTSEXISTSSEEMS
SEESHOWNOTE
CONSIDERASSUMEPRESENT
NEEDPROPOSEDESCRIBESUGGEST
USEDTRAINED
OBTAINEDDESCRIBED
GIVENFOUND
PRESENTEDDEFINED
GENERATEDSHOWN
INWITHFORON
FROMAT
USINGINTOOVER
WITHIN
HOWEVERALSOTHENTHUS
THEREFOREFIRSTHERENOW
HENCEFINALLY
#*IXTN-CFP
EXPERTSEXPERTGATING
HMEARCHITECTURE
MIXTURELEARNINGMIXTURESFUNCTION
GATE
DATAGAUSSIANMIXTURE
LIKELIHOODPOSTERIOR
PRIORDISTRIBUTION
EMBAYESIAN
PARAMETERS
STATEPOLICYVALUE
FUNCTIONACTION
REINFORCEMENTLEARNINGCLASSESOPTIMAL
*
MEMBRANESYNAPTIC
CELL*
CURRENTDENDRITICPOTENTIAL
NEURONCONDUCTANCE
CHANNELS
IMAGEIMAGESOBJECT
OBJECTSFEATURE
RECOGNITIONVIEWS
#PIXEL
VISUAL
KERNELSUPPORTVECTOR
SVMKERNELS
#SPACE
FUNCTIONMACHINES
SET
NETWORKNEURAL
NETWORKSOUPUTINPUT
TRAININGINPUTS
WEIGHTS#
OUTPUTS
NIPS Semantics
NIPS Syntax
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Function and content words
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Highlighting and templating
Open questions
• Are MCMC methods useful elsewhere?– “smoothing with negative weights”– Markov chains on grammars
• Other nonparametric language models?– infinite HMM, infinite PCFG, clustering
• Better ways of combining topics and syntax?– richer syntactic models– better combination schemes