modeling scientific impact with topical influence regression james foulds padhraic smyth department...
TRANSCRIPT
![Page 1: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/1.jpg)
Modeling Scientific Impact with Topical Influence Regression
James Foulds Padhraic Smyth
Department of Computer ScienceUniversity of California, Irvine
![Page 2: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/2.jpg)
2
Exploring a New Scientific Area
![Page 3: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/3.jpg)
3
Exploring a New Scientific Area
Which are the most important articles?
![Page 4: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/4.jpg)
4
Exploring a New Scientific Area
What are the influence relationships between articles?
![Page 5: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/5.jpg)
5
Outline
• Background: Modeling scientific impact, topic models
• Metric: Topical Influence
• Model: Topical Influence Regression
• Inference Algorithm
• Experimental Results
![Page 6: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/6.jpg)
6
Can’t We Just Use Citation Counts?
• Many citations are made out of “politeness, policy or piety” [Ziman, 1968].
Mentioned (A) in passing
Built upon the ideas of (B)
Which article is more influential?
Article (A) Article (B)
![Page 7: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/7.jpg)
7
Enter: Natural Language Processing
• Use NLP techniques to exploit textual information in conjunction with citation information
• Using this extra information, we should be able to gain a deeper understanding of scientific impact than simple citation counts
![Page 8: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/8.jpg)
8
Previous Approaches
• Traditional Bibliometrics– Citation counts, journal impact factors, H-Index
• Graph-based– PageRank on the citation graph– PageRank on an article similarity graph (Lin, 2008)
• Supervised Machine Learning– Classifying citation function (Teufel et al., 2006)
• NLP / Topic Models– Dietz et al. (2007), Gerrish & Blei (2010), Nallapati et al. (2011) …
![Page 9: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/9.jpg)
9
Our Approach
• A metric arising from a generative probabilistic model for scientific corpora
• Fully unsupervised• Exploits both textual content and the citation graph• Recovers both node-level and edge-level
influence scores• A flexible, extensible regression framework
![Page 10: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/10.jpg)
10
Latent Dirichlet Allocation Topic Models
• Topic models are a bag of words approach to modeling text corpora
• Topics are distributions over words
• Every document has a distribution over topics, with a Dirichlet prior
• Every word is assigned a latent topic, which it is assumed to be drawn from.
![Page 11: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/11.jpg)
11
Latent Dirichlet Allocation and Polya Urns
• For each document– Place colored balls in that document’s urn, where
each color is associated with a topic, and α is the Dirichlet prior on the distribution over topics.
– For each word• Draw a ball from the urn, observe its color k• Draw the word token from topic k• Place the ball back, along with a new ball of the same color
![Page 12: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/12.jpg)
12
A New Metric: Topical Influence
• Intuition: the topical influence l(a) of article a is the extent to which it coerces the documents which cite it to have
similar topics to it.
Citations Influence
![Page 13: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/13.jpg)
13
Topical Influence Regression
Parameters vector for the Dirichlet prior on the distribution over topics of article a
Set of articles that a cites
Normalized histogram of topic counts
The non-negative scalar topical influence weight for article a
![Page 14: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/14.jpg)
14
Topical Influence
• Each article a has a collection of colored balls distributed according to its topic assignments
Article a Article b
![Page 15: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/15.jpg)
15
Topical Influence
• Each article a has a collection of colored balls distributed according to its topic assignments
• It places copies of these balls into the urn for the prior of each article that cites it
Article a Article b
Article a Article b
Article c Article d Article e
![Page 16: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/16.jpg)
16
Topical Influence
• Each article a has a collection of colored balls distributed according to its topic assignments
• It places copies of these balls into the urn for the prior of each document that cites it
Article a Article b
Article a Article b
Article c Article d Article e
![Page 17: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/17.jpg)
17
Topical Influence
• Each article a has a collection of colored balls distributed according to its topic assignments
• It places copies of these balls into the urn for the prior of each document that cites it
Article a Article b
Article a Article b
Article c Article d Article e
![Page 18: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/18.jpg)
18
Topical Influence
• Each article a has a collection of colored balls distributed according to its topic assignments
• It places copies of these balls into the urn for the prior of each document that cites it
Article a Article b
Article a Article b
Article c Article d Article e
![Page 19: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/19.jpg)
19
Topical Influence• The topical influence weight specifies how many balls article a puts into
each citing document’s urn (possibly fractional)
l(a) = 5 l(b) = 5
![Page 20: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/20.jpg)
20
Topical Influence
l(a) = 10 l(b) = 5
• The topical influence weight specifies how many balls article a puts into each citing document’s urn (possibly fractional)
![Page 21: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/21.jpg)
21
Total Topical Influence• Total topical influence T(a) is defined to be the total number of balls article a
adds to the other articles’ urns
T(a) = 20 T(b) = 10
l(a) = 10 l(b) = 5
![Page 22: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/22.jpg)
22
Topical Influence Regression forEdge-level Influence Weights
We can extend the model to handle differing influence weights on citation edges:
![Page 23: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/23.jpg)
23
Topical Influence Regression forEdge-level Influence Weights
We can extend the model to handle differing influence weights on citation edges:
![Page 24: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/24.jpg)
24
Inference
• Collapsed Gibbs sampler
• Interleave gradient updates for the influence variables (stochastic EM)
![Page 25: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/25.jpg)
25
Inference – Collapsed Gibbs Sampler
Usual LDA update, but with topical influence prior
![Page 26: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/26.jpg)
26
Inference – Collapsed Gibbs Sampler
Usual LDA update, but with topical influence prior
Likelihood for a Polya urn distribution.
![Page 27: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/27.jpg)
27
Experiments
• Two corpora of scientific articles were used– ACL (1987-2011), 3286 articles– NIPS (1987-1999), 1740 articles– Only citations within the corpora were considered
• Model validation using metadata• Held-out log-likelihood• Qualitative analysis
![Page 28: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/28.jpg)
28
Model Validation Using Metadata:Number of times the citation occurs in the text
![Page 29: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/29.jpg)
29
Self citations
ACL Corpus NIPS Corpus
![Page 30: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/30.jpg)
30
Log-Likelihood on Held-Out Documents vs LDA
ACL NIPS
Wins Losses AverageImprovement
Wins Losses AverageImprovement
TIR 297 33 65.7 150 20 38.2
TIRE 276 54 63.0 148 22 38.7
![Page 31: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/31.jpg)
31
Log-Likelihood on Held-Out Documents vs LDA
ACL NIPS
Wins Losses AverageImprovement
Wins Losses AverageImprovement
TIR 297 33 65.7 150 20 38.2
TIRE 276 54 63.0 148 22 38.7
DMR 302 28 79.1 157 13 48.4
![Page 32: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/32.jpg)
32
Results: Most Influential ACL Articles
![Page 33: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/33.jpg)
33
Results: Most Influential ACL Articles
ACL Best Paper Award, 2005
Down to 5th place, from 1st by citation count
![Page 34: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/34.jpg)
34
Results: Most Influential NIPS Articles
![Page 35: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/35.jpg)
35
Results: Most Influential NIPS Articles
Down to 13th place, from 1st by citation count
Seminal papers
![Page 36: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/36.jpg)
36
An Optimal-time Binarization Algorithm for Linear Context-Free Rewriting Systems with Fan-out Two.C. Gomez-Rodriguez, G. Satta.
Results: Edge Influences, ACL
A Hierarchical Phrase-Based Model for Statistical Machine Translation.D. Chiang.
Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. F. Och and H. Ney.
BLEU: a Method for Automatic Evaluation of Machine Translation. K. Papineni, S. Roukos, T. Ward, W. Zhu.
Toward Smaller, Faster, and Better Hierarchical Phrase-based SMT.M. Yang, J. Zheng.
1.48
0.00
2.54
0.60
![Page 37: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/37.jpg)
37
An Optimal-time Binarization Algorithm for Linear Context-Free Rewriting Systems with Fan-out Two.C. Gomez-Rodriguez, G. Satta.
Results: Edge Influences, ACL
A Hierarchical Phrase-Based Model for Statistical Machine Translation.D. Chiang.
Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. F. Och and H. Ney.
BLEU: a Method for Automatic Evaluation of Machine Translation. K. Papineni, S. Roukos, T. Ward, W. Zhu.
Toward Smaller, Faster, and Better Hierarchical Phrase-based SMT.M. Yang, J. Zheng.
1.48
0.00
2.54
0.60
Related SMT paper
BLEU evaluationtechnique
Builds uponthe method
Not related
![Page 38: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/38.jpg)
38
Multi-time Models for Temporally Abstract Planning. D. Precup, R. Sutton.
Results: Edge Influences, NIPS
Feudal Reinforcement Learning.P. Dayan, G. Hinton
Memory-based Reinforcement Learning: Efficient Computation with Prioritized Sweeping. A. Moore, C. Atkeson.
A Delay-Line Based Motion Detection Chip. T. Horiuchi, J. Lazzaro, A. Moore, C. Koch.
The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-Spaces. A. Moore.
5.47
0.00
3.36
1.71
![Page 39: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/39.jpg)
39
Multi-time Models for Temporally Abstract Planning. D. Precup, R. Sutton.
Results: Edge Influences, NIPS
Feudal Reinforcement Learning.P. Dayan, G. Hinton
Memory-based Reinforcement Learning: Efficient Computation with Prioritized Sweeping. A. Moore, C. Atkeson.
A Delay-Line Based Motion Detection Chip. T. Horiuchi, J. Lazzaro, A. Moore, C. Koch.
The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-Spaces. A. Moore.
5.47
0.00
3.36
1.71Irrelevant Less relevant
![Page 40: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/40.jpg)
40
Conclusions / Future Work• Topical Influence is a quantitative measure of scientific impact
which exploits the content of the articles as well as the citation graph
• Topical Influence Regression can be used to infer topical influence, per article and per citation edge
• Future work– Authors, journals– Citation context– Temporal dynamics– Application to social media– Other dimensions of scientific importance
![Page 41: Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de65503460f94ade73b/html5/thumbnails/41.jpg)
41
Thanks!
• Questions?