![Page 1: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/1.jpg)
TopicRankGraph-Based Topic Ranking for Keyphrase Extraction
Adrien Bougouin Florian Boudin Béatrice Daille
Université de Nantes, LINA, France
16 October 2013
![Page 2: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/2.jpg)
IntroductionProblem statement
Keyphrases
� Word or multi-word expressions
� Overview of a document’s content
Applications
� Document indexing
� Document clustering
� Text summarization
� Query expansion
� Targeted advertising
� etc.
Lack of annotated documentsMany documents have no associated keyphrases.
1
![Page 3: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/3.jpg)
IntroductionProblem statement
Keyphrases
� Word or multi-word expressions
� Overview of a document’s content
Applications
� Document indexing
� Document clustering
� Text summarization
� Query expansion
� Targeted advertising
� etc.
Lack of annotated documentsMany documents have no associated keyphrases.
1
![Page 4: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/4.jpg)
IntroductionProblem statement
Keyphrases
� Word or multi-word expressions
� Overview of a document’s content
Applications
� Document indexing
� Document clustering
� Text summarization
� Query expansion
� Targeted advertising
� etc.
Lack of annotated documentsMany documents have no associated keyphrases.
1
![Page 5: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/5.jpg)
IntroductionAutomatic keyphrase extraction
document Linguistic Preprocessing
Candidate Extraction
Candidate Classification
Ranking
Keyphrase Selection keyphrases
2
![Page 6: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/6.jpg)
IntroductionAutomatic keyphrase extraction
document Linguistic Preprocessing
Candidate Extraction
Candidate Classification
Ranking
Keyphrase Selection keyphrases
supervised
unsupervised
2
![Page 7: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/7.jpg)
IntroductionAutomatic keyphrase extraction
document Linguistic Preprocessing
Candidate Extraction
Candidate Classification
Ranking
Keyphrase Selection keyphrases
supervised
unsupervisedunsupervised
2
![Page 8: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/8.jpg)
IntroductionExample
Project Euclid and the role of research libraries in scholarly
publishing
Project Euclid, a joint electronic journal publishing initiativeof Cornell University Library and Duke University Press is dis-cussed in the broader contexts of the changing patterns of scholarlycommunication and the publishing scene of mathematics. Spe-cific aspects of the project such as partnerships and the creation ofan economic model are presented as well as what it takes to bea publisher. Libraries have gained important and relevant experiencethrough the creation and management of digital libraries, but theyneed to develop further skills if they want to adopt a new role in thelife cycle of scholarly communication.
3
![Page 9: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/9.jpg)
Related WorkUnsupervised methods
Mostly ranking technics using:
� language models
� clusters� or graphs of word co-occurrences
◮ weighted with co-occurrence number or semanticmeasure
◮ refined with similar documents◮ biased with topic probabilities
4
![Page 10: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/10.jpg)
Related WorkUnsupervised methods
Mostly ranking technics using:
� language models
� clusters� or graphs of word co-occurrences
◮ weighted with co-occurrence number or semanticmeasure
◮ refined with similar documents◮ biased with topic probabilities
(Tomokiyo and Hurst, 2003)
4
![Page 11: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/11.jpg)
Related WorkUnsupervised methods
Mostly ranking technics using:
� language models
� clusters� or graphs of word co-occurrences
◮ weighted with co-occurrence number or semanticmeasure
◮ refined with similar documents◮ biased with topic probabilities
(Liu et al., 2009)
4
![Page 12: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/12.jpg)
Related WorkUnsupervised methods
Mostly ranking technics using:
� language models
� clusters� or graphs of word co-occurrences
◮ weighted with co-occurrence number or semanticmeasure
◮ refined with similar documents◮ biased with topic probabilities
(Mihalcea and Tarau, 2004, TextRank)
4
![Page 13: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/13.jpg)
Related WorkUnsupervised methods
Mostly ranking technics using:
� language models
� clusters� or graphs of word co-occurrences
◮ weighted with co-occurrence number or semanticmeasure
◮ refined with similar documents◮ biased with topic probabilities
(Wan and Xiao, 2008; Tsatsaronis et al., 2010)
4
![Page 14: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/14.jpg)
Related WorkUnsupervised methods
Mostly ranking technics using:
� language models
� clusters� or graphs of word co-occurrences
◮ weighted with co-occurrence number or semanticmeasure
◮ refined with similar documents◮ biased with topic probabilities
(Wan and Xiao, 2008)
4
![Page 15: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/15.jpg)
Related WorkUnsupervised methods
Mostly ranking technics using:
� language models
� clusters� or graphs of word co-occurrences
◮ weighted with co-occurrence number or semanticmeasure
◮ refined with similar documents◮ biased with topic probabilities
(Liu et al., 2010)
4
![Page 16: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/16.jpg)
Related WorkGraph-based approach: TextRank
Project Euclid and the role of research libraries in scholarly publishing
Project Euclid, a joint electronic journal publishing initiative of Cor-nell University Library and Duke University Press is discussed in thebroader contexts of the changing patterns of scholarly communica-tion and the publishing scene of mathematics. Specific aspects of theproject such as partnerships and the creation of an economic modelare presented as well as what it takes to be a publisher. Librarieshave gained important and relevant experience through the creationand management of digital libraries, but they need to develop furtherskills if they want to adopt a new role in the life cycle of scholarlycommunication.
5
![Page 17: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/17.jpg)
Related WorkGraph-based approach: TextRank
university
dukelibrary
press cornell
further
skills
scholarly
communication publishing
scene
initiative
journal
electronic
joint
new
role
digital
libraries research
relevant
experiencemodel
economic
specific
aspects
lifecycle
euclid
project
such
university
dukelibrary
press cornell
further
skills
scholarly
communication publishing
scene
initiative
journal
electronic
joint
new
role
digital
libraries research
relevant
experiencemodel
economic
specific
aspects
lifecycle
euclid
project
such
Generated Keyphrase
electronic journal publishingscholarly publishinglibrariesuniversityprojecteconomicrelevantrole
PageRank’s “voting” concept
High-scoring words contribute more to the score of theirconnected words.
5
![Page 18: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/18.jpg)
Related WorkGraph-based approach: TextRank
university2.378
duke0.655 library
0.655
press0.655
cornell0.655
further1.000
skills1.000
scholarly1.140
communication0.634
publishing2.121
scene0.601
initiative0.601
journal1.095
electronic1.163
joint0.644
new1.000
role1.000
digital0.770 libraries
1.459
research0.770
relevant1.000
experience1.000
model1.000
economic1.000
specific1.000
aspects1.000
life1.000
cycle1.000
euclid0.770
project1.459
such0.770
university2.378
duke0.655 library
0.655
press0.655
cornell0.655
further1.000
skills1.000
scholarly1.140
communication0.634
publishing2.121
scene0.601
initiative0.601
journal1.095
electronic1.163
joint0.644
new1.000
role1.000
digital0.770 libraries
1.459
research0.770
relevant1.000
experience1.000
model1.000
economic1.000
specific1.000
aspects1.000
life1.000
cycle1.000
euclid0.770
project1.459
such0.770
Generated Keyphrase
electronic journal publishingscholarly publishinglibrariesuniversityprojecteconomicrelevantrole
PageRank’s “voting” concept
High-scoring words contribute more to the score of theirconnected words.
5
![Page 19: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/19.jpg)
Related WorkGraph-based approach: TextRank
university2.378
duke0.655 library
0.655
press0.655
cornell0.655
further1.000
skills1.000
scholarly1.140
communication0.634
publishing2.121
scene0.601
initiative0.601
journal1.095
electronic1.163
joint0.644
new1.000
role1.000
digital0.770 libraries
1.459
research0.770
relevant1.000
experience1.000
model1.000
economic1.000
specific1.000
aspects1.000
life1.000
cycle1.000
euclid0.770
project1.459
such0.770
university2.378
duke0.655 library
0.655
press0.655
cornell0.655
further1.000
skills1.000
scholarly1.140
communication0.634
publishing2.121
scene0.601
initiative0.601
journal1.095
electronic1.163
joint0.644
new1.000
role1.000
digital0.770 libraries
1.459
research0.770
relevant1.000
experience1.000
model1.000
economic1.000
specific1.000
aspects1.000
life1.000
cycle1.000
euclid0.770
project1.459
such0.770
Generated Keyphrase
electronic journal publishingscholarly publishinglibrariesuniversityprojecteconomicrelevantrole
PageRank’s “voting” concept
High-scoring words contribute more to the score of theirconnected words.
5
![Page 20: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/20.jpg)
Related WorkGraph-based approach: TextRank
university2.378
duke0.655 library
0.655
press0.655
cornell0.655
further1.000
skills1.000
scholarly1.140
communication0.634
publishing2.121
scene0.601
initiative0.601
journal1.095
electronic1.163
joint0.644
new1.000
role1.000
digital0.770 libraries
1.459
research0.770
relevant1.000
experience1.000
model1.000
economic1.000
specific1.000
aspects1.000
life1.000
cycle1.000
euclid0.770
project1.459
such0.770
university2.378
duke0.655 library
0.655
press0.655
cornell0.655
further1.000
skills1.000
scholarly1.140
communication0.634
publishing2.121
scene0.601
initiative0.601
journal1.095
electronic1.163
joint0.644
new1.000
role1.000
digital0.770 libraries
1.459
research0.770
relevant1.000
experience1.000
model1.000
economic1.000
specific1.000
aspects1.000
life1.000
cycle1.000
euclid0.770
project1.459
such0.770
Generated Keyphrase
electronic journal publishingscholarly publishinglibrariesuniversityprojecteconomicrelevantrole
PageRank’s “voting” concept
High-scoring words contribute more to the score of theirconnected words.
5
![Page 21: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/21.jpg)
Related WorkGraph-based approach: TextRank
university2.378
duke0.655 library
0.655
press0.655
cornell0.655
further1.000
skills1.000
scholarly1.140
communication0.634
publishing2.121
scene0.601
initiative0.601
journal1.095
electronic1.163
joint0.644
new1.000
role1.000
digital0.770 libraries
1.459
research0.770
relevant1.000
experience1.000
model1.000
economic1.000
specific1.000
aspects1.000
life1.000
cycle1.000
euclid0.770
project1.459
such0.770
university2.378
duke0.655 library
0.655
press0.655
cornell0.655
further1.000
skills1.000
scholarly1.140
communication0.634
publishing2.121
scene0.601
initiative0.601
journal1.095
electronic1.163
joint0.644
new1.000
role1.000
digital0.770 libraries
1.459
research0.770
relevant1.000
experience1.000
model1.000
economic1.000
specific1.000
aspects1.000
life1.000
cycle1.000
euclid0.770
project1.459
such0.770
Generated Keyphrase
electronic journal publishingscholarly publishinglibrariesuniversityprojecteconomicrelevantrole
PageRank’s “voting” concept
High-scoring words contribute more to the score of theirconnected words.
5
![Page 22: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/22.jpg)
Related WorkGraph-based approach: TextRank
Limitations� Word nodes
� Co-occurence window
� Several nodes for one topic
6
![Page 23: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/23.jpg)
This Work
Limitations of previous work
� Word nodes
� Co-occurence window
� Several nodes for one topic
Proposal
1 Topic nodes
2 Complete graph construction
7
![Page 24: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/24.jpg)
This Work
Limitations of previous work
� Word nodes
� Co-occurence window
� Several nodes for one topic
Proposal
1 Topic nodes
2 Complete graph construction
7
![Page 25: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/25.jpg)
This Work
Limitations of previous work
� Word nodes
� Co-occurence window
� Several nodes for one topic
Proposal
1 Topic nodes
2 Complete graph construction
7
![Page 26: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/26.jpg)
Plan
1 TopicRank
2 Evaluation
3 Conclusion and Future Work
8
![Page 27: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/27.jpg)
Plan
1 TopicRank
2 Evaluation
3 Conclusion and Future Work
9
![Page 28: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/28.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
Project Euclid and the role of researchlibraries in scholarly publishing
Project Euclid, a joint electronic journal pub-lishing initiative of Cornell University Libraryand Duke University Press is discussed in thebroader contexts of the changing patterns ofscholarly communication and the publishingscene of mathematics. Specific aspects of theproject such as partnerships and the creationof an economic model are presented as wellas what it takes to be a publisher. Librarieshave gained important and relevant experi-ence through the creation and managementof digital libraries, but they need to developfurther skills if they want to adopt a new rolein the life cycle of scholarly communication.
10
![Page 29: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/29.jpg)
TopicRank
1 Candidate extraction
⇒ (NOUN|ADJ)+
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
Project Euclid and the role of researchlibraries in scholarly publishing
Project Euclid, a joint electronic journal pub-lishing initiative of Cornell University Libraryand Duke University Press is discussed in thebroader contexts of the changing patterns ofscholarly communication and the publishingscene of mathematics. Specific aspects of theproject such as partnerships and the creationof an economic model are presented as wellas what it takes to be a publisher. Librarieshave gained important and relevant experi-ence through the creation and managementof digital libraries, but they need to developfurther skills if they want to adopt a new rolein the life cycle of scholarly communication.
10
![Page 30: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/30.jpg)
TopicRank
1 Candidate extraction
⇒ (NOUN|ADJ)+
no linguistic knowledge
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
Project Euclid and the role of researchlibraries in scholarly publishing
Project Euclid, a joint electronic journal pub-lishing initiative of Cornell University Libraryand Duke University Press is discussed in thebroader contexts of the changing patterns ofscholarly communication and the publishingscene of mathematics. Specific aspects of theproject such as partnerships and the creationof an economic model are presented as wellas what it takes to be a publisher. Librarieshave gained important and relevant experi-ence through the creation and managementof digital libraries, but they need to developfurther skills if they want to adopt a new rolein the life cycle of scholarly communication.
10
![Page 31: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/31.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
⇒ Hierarchical clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
ID Topic
C01 cornell university library; digital libraries;research libraries; libraries
C02 project euclid; project suchC03 publishing scene; scholarly publishing;
publisher
C04 role; new role ←− stem overlap≥ 1
4
C05 importantC06 scholarly communicationC07 further skillsC08 partnershipsC09 mathematicsC10 joint electronic journal publishing initiativeC11 contextsC12 specific aspectsC13 economic modelC14 duke university pressC15 relevant experienceC16 creationC17 life cycleC18 patternsC19 management
10
![Page 32: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/32.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
⇒ Hierarchical clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
ID Topic
C01 cornell university library; digital libraries;research libraries; libraries
C02 project euclid; project suchC03 publishing scene; scholarly publishing;
publisher
C04 role; new role ←− stem overlap≥ 1
4
naive topic similarity
C05 importantC06 scholarly communicationC07 further skillsC08 partnershipsC09 mathematicsC10 joint electronic journal publishing initiativeC11 contextsC12 specific aspectsC13 economic modelC14 duke university pressC15 relevant experienceC16 creationC17 life cycleC18 patternsC19 management
10
![Page 33: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/33.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
⇒ Complete graph
4 Topic ranking
5 Keyphrase selection
C01
C02
C03
C04
C05C06
C07
C08
C09
C10
C11
C12
C13
C14
C15C16
C17
C18
C19offset
positionweighting
C01
C02
C03
C04
C05C06
C07
C08
C09
C10
C11
C12
C13
C14
C15C16
C17
C18
C19
10
![Page 34: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/34.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
⇒ PageRank’s scoring
5 Keyphrase selection
C01
2.673
C02
2.237
C03
2.285
C04
1.451C05
0.612
C06
1.017
C07
0.405
C08
0.717
C09
0.600
C10
0.749
C11
0.600
C12
0.750
C13
0.575
C14
0.669 C15
0.615
C16
1.112
C17
0.455
C18
0.697
C19
0.600
s ore(Ci) = (1−λ )+λ ×∑Cj 6=Ci
weight(Cj ,Ci)×score(Cj)
∑Ck 6=Cjweight(Cj ,Ck)
C01
2.673
C02
2.237
C03
2.285
C04
1.451C05
0.612
C06
1.017
C07
0.405
C08
0.717
C09
0.600
C10
0.749
C11
0.600
C12
0.750
C13
0.575
C14
0.669 C15
0.615
C16
1.112
C17
0.455
C18
0.697
C19
0.600
10
![Page 35: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/35.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Rank ID Topic
01 C01 cornell university library; digital libraries;research libraries; libraries
02 C03 publishing scene; scholarly publishing;publisher
03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .
10
![Page 36: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/36.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Project Euclid and the role ofresearch libraries in scholarlypublishing
Project Euclid, a joint elec-tronic journal publishing initia-tive of Cornell University Libraryand Duke University Press is dis-cussed in the broader contexts ofthe changing patterns of scholarlycommunication and the publishingscene of mathematics. [. . . ] Li-braries have gained important andrelevant experience through thecreation and management of digi-tal libraries, but they need to de-velop further skills if they want toadopt a new role in the life cycleof scholarly communication.
Rank ID Topic
01 C01 cornell university library; digital libraries;research libraries; libraries
02 C03 publishing scene; scholarly publishing;publisher
03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .
10
![Page 37: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/37.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Rank ID Topic
01 C01 cornell university library; digital libraries;research libraries; libraries
02 C03 publishing scene; scholarly publishing;publisher
03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .
10
![Page 38: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/38.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Project Euclid and the role ofresearch libraries in scholarlypublishing
Project Euclid, a joint elec-tronic journal publishing initia-tive of Cornell University Libraryand Duke University Press is dis-cussed in the broader contexts ofthe changing patterns of scholarlycommunication and the publish-ing scene of mathematics. Spe-cific aspects of the project suchas partnerships and the creation ofan economic model are presentedas well as what it takes to be apublisher. [. . . ]
Rank ID Topic
01 C01 cornell university library; digital libraries;research libraries; libraries
02 C03 publishing scene; scholarly publishing;publisher
03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .
10
![Page 39: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/39.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Rank ID Topic
01 C01 cornell university library; digital libraries;research libraries; libraries
02 C03 publishing scene; scholarly publishing;publisher
03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .
10
![Page 40: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/40.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Project Euclid and the role ofresearch libraries in scholarlypublishing
[. . . ] Specific aspects of theproject such as partnerships andthe creation of an economic modelare presented as well as what ittakes to be a publisher. [. . . ]
Rank ID Topic
01 C01 cornell university library; digital libraries;research libraries; libraries
02 C03 publishing scene; scholarly publishing;publisher
03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .
10
![Page 41: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/41.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Rank ID Topic
01 C01 cornell university library; digital libraries;research libraries; libraries
02 C03 publishing scene; scholarly publishing;publisher
03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .
10
![Page 42: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/42.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Project Euclid and the role ofresearch libraries in scholarlypublishing
[. . . ] Libraries have gained im-portant and relevant experiencethrough the creation and manage-ment of digital libraries, but theyneed to develop further skills ifthey want to adopt a new role inthe life cycle of scholarly commu-nication.
Rank ID Topic
01 C01 cornell university library; digital libraries;research libraries; libraries
02 C03 publishing scene; scholarly publishing;publisher
03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .
10
![Page 43: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/43.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Rank ID Topic
01 C01 cornell university library; digital libraries;research libraries; libraries
02 C03 publishing scene; scholarly publishing;publisher
03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .
10
![Page 44: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/44.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Rank ID Topic
01 C01 cornell university library; digital libraries;research libraries; libraries
02 C03 publishing scene; scholarly publishing;publisher
03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .
10
![Page 45: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/45.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Keyphrase
research librariesscholarly publishingproject euclidrolecreationscholarly communicationmathematicsspecific aspectsjoint electronic journal publishing initiativepartnerships. . .
10
![Page 46: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/46.jpg)
TopicRank
1 Candidate extraction
2 Candidate clustering
3 Graph construction
4 Topic ranking
5 Keyphrase selection
⇒ First appearing one
Keyphrase
research librariesscholarly publishingproject euclidrolecreationscholarly communicationmathematicsspecific aspectsjoint electronic journal publishing initiativepartnerships. . .
10
![Page 47: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/47.jpg)
Plan
1 TopicRank
2 Evaluation
3 Conclusion and Future Work
11
![Page 48: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/48.jpg)
EvaluationDatasets
Two English datasets:� Inspec contains 500 abstracts of journal papers
◮ 136.3 tokens/document
� SemEval (2010) contains 100 scientific papers◮ 5179.6 tokens/document
Two French datasets:� WikiNews contains 100 news articles
◮ 309.6 tokens/document
� DEFT (2012) contains 93 scientific papers◮ 6844.0 tokens/document
12
![Page 49: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/49.jpg)
EvaluationDatasets
Two English datasets:� Inspec contains 500 abstracts of journal papers
◮ 136.3 tokens/document
� SemEval (2010) contains 100 scientific papers◮ 5179.6 tokens/document
Two French datasets:� WikiNews contains 100 news articles
◮ 309.6 tokens/document
� DEFT (2012) contains 93 scientific papers◮ 6844.0 tokens/document
12
![Page 50: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/50.jpg)
EvaluationDatasets
Two English datasets:� Inspec contains 500 abstracts of journal papers
◮ 136.3 tokens/document
� SemEval (2010) contains 100 scientific papers◮ 5179.6 tokens/document
Two French datasets:� WikiNews contains 100 news articles
◮ 309.6 tokens/document
� DEFT (2012) contains 93 scientific papers◮ 6844.0 tokens/document
12
![Page 51: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/51.jpg)
EvaluationDatasets
Two English datasets:� Inspec contains 500 abstracts of journal papers
◮ 136.3 tokens/document
� SemEval (2010) contains 100 scientific papers◮ 5179.6 tokens/document
Two French datasets:� WikiNews contains 100 news articles
◮ 309.6 tokens/document
� DEFT (2012) contains 93 scientific papers◮ 6844.0 tokens/document
12
![Page 52: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/52.jpg)
EvaluationDatasets
Two English datasets:� Inspec contains 500 abstracts of journal papers
◮ 136.3 tokens/document
� SemEval (2010) contains 100 scientific papers◮ 5179.6 tokens/document
Two French datasets:� WikiNews contains 100 news articles
◮ 309.6 tokens/document
� DEFT (2012) contains 93 scientific papers◮ 6844.0 tokens/document
12
![Page 53: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/53.jpg)
EvaluationBaselines
� TF-IDF weighting� TextRank
◮ Word co-occurrence graph with a window of 2◮ Keyphrase generation based on keywords (10-bests)
� SingleRank◮ Word co-occurrence graph with a window of 10◮ Candidate keyphrases scored by their words’ score (sum)
13
![Page 54: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/54.jpg)
EvaluationBaselines
� TF-IDF weighting� TextRank
◮ Word co-occurrence graph with a window of 2◮ Keyphrase generation based on keywords (10-bests)
� SingleRank◮ Word co-occurrence graph with a window of 10◮ Candidate keyphrases scored by their words’ score (sum)
13
![Page 55: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/55.jpg)
EvaluationBaselines
� TF-IDF weighting� TextRank
◮ Word co-occurrence graph with a window of 2◮ Keyphrase generation based on keywords (10-bests)
� SingleRank◮ Word co-occurrence graph with a window of 10◮ Candidate keyphrases scored by their words’ score (sum)
13
![Page 56: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/56.jpg)
EvaluationMeasures
� Cut-off at 10 keyphrases
� F-score ⇒ compromise between precision and recall
f-score = (1+β 2)×precision× recall
(β 2×precision)+ recall
β = 1
� Problem of dealing with gold standard
⇒ Stemmed form comparisons
14
![Page 57: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/57.jpg)
EvaluationMain results
Method Inspec SemEval WikiNews DEFT
TF-IDF 33.4 10.5 34.3 13.2TextRank 12.7 5.6 8.6 5.7
SingleRank 35.2 3.7 19.7 5.9TopicRank 27.9 12.1 35.6 15.1
� Improvement over TF-IDF
� Significant improvement over graph-based methods
� Performance loss on Inspec
15
![Page 58: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/58.jpg)
EvaluationIndividual contributions
Method Inspec SemEval WikiNews DEFT
SingleRank 35.2 3.7 19.7 5.9
+phrases 22.1 8.0 28.9 13.5+topics 26.8 11.9 31.4 14.8
+complete 35.5 4.4 20.3 5.8
TopicRank 27.9 12.1 35.6 15.1
� Nodes: Topics > candidates > words� Complete graph ≥ co-occurrence graph� Contribution improve performances� The above statements are false on Inspec
16
![Page 59: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/59.jpg)
EvaluationKeyphrase selection
Keyphrase selection Inspec SemEval WikiNews DEFT
First position 27.9 12.1 35.6 15.1Frequency 26.8 1.4 26.2 2.5
Centroid 24.7 1.5 28.5 3.4
Upper bound 35.6 30.3 42.9 19.3
� Still room for improvement
17
![Page 60: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/60.jpg)
Plan
1 TopicRank
2 Evaluation
3 Conclusion and Future Work
18
![Page 61: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/61.jpg)
Conclusion and Future Work
What we have done:
� Proposed TopicRank
� Topic ranking instead of word ranking
� Complete graph
� Experiments conducted of four standard datasets
� Good results
� Promising upper bound results
Still to do:
� Experiment various topic identifications
� Provide a keyphrase selection strategy getting closerto the upper bound
19
![Page 62: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/62.jpg)
Conclusion and Future Work
What we have done:
� Proposed TopicRank
� Topic ranking instead of word ranking
� Complete graph
� Experiments conducted of four standard datasets
� Good results
� Promising upper bound results
Still to do:
� Experiment various topic identifications
� Provide a keyphrase selection strategy getting closerto the upper bound
19
![Page 63: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/63.jpg)
Thank you
20
![Page 64: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/64.jpg)
BackupsCandidate Extraction
� Focusing on nounsand adjectives is“enough” for English
� Prepositions anddeterminers shouldalso be consideredfor French
StatisticCorpus
SemEval DEFT
Containing nouns 95.9% 79.3%Containing proper nouns 5.8% 16.8%
Containing adjectives 40.5% 28.8%Containing verbs 3.4% 0.5%
Containing adverbs 0.6% 0.5%Containing prepositions 1.2% 12.7%Containing determiners 0.0% 8.1%
Containing others 2.1% 5.8%
21
![Page 65: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/65.jpg)
BackupsCandidate Clustering
The hierarchical clustering is an iterative algorithm:
� Initial state: candidates keyphrases are clusters
� Clusters with the highest similarity are mergedtogether
� Clusters similarity is the average similarity betweentheir candidates ci :
similarity(c1,c2) =||stem(c1)∩ stem(c2)||
||stem(c1)∪ stem(c2)||
� A similarity threshold is set to 0.25
22
![Page 66: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/66.jpg)
BackupsGraph Construction
� Nodes are topics
� Every nodes are connected to each other
� Connections between topics are weighted by thesemantic strength between them
� Topics appearing close to each other have a highsemantic strength:
weight(ti , tj) = ∑ci∈ti
∑cj∈tj
dist(ci ,cj )
dist(ci ,cj) = ∑pi∈pos(ci )
∑pj∈pos(cj )
1
|pi −pj |
23
![Page 67: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/67.jpg)
BackupsGraph Construction
Inspec SemEval WikiNews DEFT
clusters/documents 20.9 272.4 52.4 546.5
24
![Page 68: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/68.jpg)
BackupsTopic Ranking
PageRank’s “voting” concept
High-scoring topics contribute more to the score of theirconnected topics.
score(Ci) = (1−λ )+λ × ∑Cj 6=Ci
weight(Ci ,Cj)× score(Cj)
∑Ck 6=Cj
weight(Cj ,Ck)
λ = 0.85
25
![Page 69: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/69.jpg)
BackupsMain Results
MethodInspec SemEval WikiNews DEFT
P R F P R F P R F P R F
TF-IDF 32.7 38.6 33.4 13.2 8.9 10.5 33.9 35.9 34.3 10.3 19.1 13.2TextRank 14.2 12.5 12.7 7.9 4.5 5.6 9.3 8.3 8.6 4.9 7.1 5.7
SingleRank 34.8 40.4 35.2 4.6 3.2 3.7 19.4 20.7 19.7 4.5 9.0 5.9TopicRank 27.6 31.5 27.9 14.9 10.3 12.1 35.0 37.5 35.6 11.7 21.7 15.1
26
![Page 70: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/70.jpg)
BackupsContributions Evaluation
MethodInspec SemEval WikiNews DEFT
P R F P R F P R F P R F
SingleRank 34.8 40.4 35.2 4.6 3.2 3.7 19.4 20.7 19.7 4.5 9.0 5.9
+phrases 21.5 25.9 22.1 9.6 7.0 8.0 28.6 30.1 28.9 10.5 19.7 13.5+topics 26.6 30.2 26.8 14.7 10.2 11.9 31.0 32.8 31.4 11.5 21.4 14.8
+complete 34.9 41.0 35.5 5.5 3.8 4.4 20.0 21.4 20.3 4.4 9.0 5.8
TopicRank 27.6 31.5 27.9 14.9 10.3 12.1 35.0 37.5 35.6 11.7 21.7 15.1
27
![Page 71: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/71.jpg)
BackupsKeyphrase Selection Evaluation
Keyphrase selectionInspec SemEval WikiNews DEFT
P R F P R F P R F P R F
First position 27.6 31.5 27.9 14.9 10.3 12.1 35.0 37.5 35.6 11.7 21.7 15.1Frequency 26.7 30.2 26.8 1.7 1.2 1.4 25.7 27.6 26.2 1.9 3.8 2.5
Centroid 24.5 28.0 24.7 1.9 1.2 1.5 28.1 29.9 28.5 2.6 5.0 3.4
Upper bound 36.4 39.0 35.6 37.6 25.8 30.3 42.5 44.8 42.9 14.9 28.0 19.3
28
![Page 72: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/72.jpg)
References
Zhiyuan Liu, Peng Li, Yabin Zheng, and Maosong Sun.Clustering to Find Exemplar Terms for KeyphraseExtraction. In Proceedings of the 2009 Conference onEmpirical Methods in Natural Language Processing:Volume 1, pages 257–266, Stroudsburg, PA, USA, 2009.Association for Computational Linguistics. ISBN978-1-932432-59-6. URLhttp://dl.acm.org/citation.cfm?id=1699510.169954
29
![Page 73: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/73.jpg)
ReferencesZhiyuan Liu, Wenyi Huang, Yabin Zheng, and Maosong
Sun. Automatic Keyphrase Extraction Via TopicDecomposition. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Processing,pages 366–376, Stroudsburg, PA, USA, 2010.Association for Computational Linguistics. URLhttp://dl.acm.org/citation.cfm?id=1870658.187069
Rada Mihalcea and Paul Tarau. TextRank: BringingOrder Into Texts. In Dekang Lin and Dekai Wu, editors,Proceedings of the 2004 Conference on EmpiricalMethods in Natural Language Processing, pages404–411, Barcelona, Spain, July 2004. Association forComputational Linguistics.
30
![Page 74: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/74.jpg)
References
Takashi Tomokiyo and Matthew Hurst. A LanguageModel Approach to Keyphrase Extraction. InProceedings of the ACL 2003 Workshop on MultiwordExpressions: Analysis, Acquisition and Treatment -Volume 18, pages 33–40, Stroudsburg, PA, USA, 2003.Association for Computational Linguistics. URLhttp://dx.doi.org/10.3115/1119282.1119287.
George Tsatsaronis, Iraklis Varlamis, and Kjetil Nørvåg.SemanticRank: Ranking Keywords and Sentences UsingSemantic Graphs. In Proceedings of the 23rdInternational Conference on Computational Linguistics,pages 1074–1082, Stroudsburg, PA, USA, 2010.
31
![Page 75: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase](https://reader031.vdocuments.us/reader031/viewer/2022022520/5b1cc85b7f8b9a952f8b5606/html5/thumbnails/75.jpg)
References
Association for Computational Linguistics. URLhttp://dl.acm.org/citation.cfm?id=1873781.187390
Xiaojun Wan and Jianguo Xiao. Single DocumentKeyphrase Extraction Using Neighborhood Knowledge.In Proceedings of the 23rd National Conference onArtificial Intelligence - Volume 2, pages 855–860. AAAIPress, 2008. ISBN 978-1-57735-368-3. URLhttp://dl.acm.org/citation.cfm?id=1620163.162020
32