![Page 1: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/1.jpg)
Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West
![Page 2: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/2.jpg)
![Page 3: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/3.jpg)
Poor structureLimited content
No referencesNot clear
Not integrated
![Page 4: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/4.jpg)
![Page 5: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/5.jpg)
?
![Page 6: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/6.jpg)
Help to structure the
content
![Page 7: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/7.jpg)
We use the content of the articleto generate recommendations
We use the category networkto generate recommendations
![Page 8: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/8.jpg)
![Page 9: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/9.jpg)
Latent Dirichlet Allocation (LDA)200 topics
Topic 1 Topic 2 Topic 3
Article topics0.1 0.05 0.7
+ + + ...
Final recommendation
Topic 1 (0.1)Topic 2 (0.05)Topic 3 (0.7)...
![Page 10: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/10.jpg)
Collaborative Filtering
Based on matrix factorization with Alternating Least Squares
One row per article and one column per section
1 if the section S appears in the article A
![Page 11: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/11.jpg)
Limitation:
The article-based approach cannot generate recommendations for new
articles!
![Page 12: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/12.jpg)
![Page 13: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/13.jpg)
Intuition:
Articles in the same category share similar
sections
We can use the categories to generate templates for
the editorsCategory:American epic
films
{Plot, Cast, Production}
![Page 14: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/14.jpg)
Taxonomic assumption
Categories are organised in a hierarchical structure
Frequent sections on the children may be relevant
for the parent
![Page 15: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/15.jpg)
Wait, it’s not so easy…
![Page 16: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/16.jpg)
Government ➜ Public administration ➜ Public economics ➜ Economic policy ➜ Government
Peter Eades, Xuemin Lin, W.F. Smyth
Removed 4k edges out of ~4M
![Page 17: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/17.jpg)
Categories with heterogeneous articlesmust be removed
![Page 18: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/18.jpg)
Distribution of the article types in a category
We assigned 55 top level types to the articles
Category C
![Page 19: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/19.jpg)
to select the categories to keep
in the network
✓
✕
0.774
0.568
![Page 20: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/20.jpg)
![Page 21: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/21.jpg)
P(S1 | CAT1) = 2/7
Example:
American_male_film_actorsFilmography: 0.59
Career: 0.47Personal life: 0.38
…
Filmography appears in 59% of the articles in the category
“American_male_film_actors”
Probability P(S | C) of observing section S in category C
Category–Section counts
![Page 22: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/22.jpg)
Collaborative Filtering
Based on matrix factorization with Alternating Least Squares
One row per category and one column per section
Ratings defined asP(S1 | CAT1)
![Page 23: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/23.jpg)
American_male_film_actorsFilmography: 0.59Career: 0.47Personal life: 0.38...
American_film_producersFilmography: 0.39Career: 0.34Personal life: 0.26...
Living_peopleCareer: 0.25Personal life: 0.17Biography: 0.13...
Cj ∈ Categories_of(Leonardo
DiCaprio)Leonardo DiCaprio
Career: 9.98Filmography: 9.51Personal life: 9.48
Early life: 8.40Awards: 2.45
...
Merging phase
Learning2Rank: weighted sum
sort
Categories_of(Leonardo DiCaprio) = {American_male_film_actors,
American_film_producersLiving_people}
![Page 24: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/24.jpg)
Evaluation
English Wikipedia - September 20175.5M articles
300K sections
![Page 25: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/25.jpg)
Collaborative filtering
Precision < 0.2%Recall < 1.5%
Cold start problem: in average 3.4 sections
![Page 26: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/26.jpg)
Topic modeling
Precision@10 = 6%(upper bound 28%)
Recall@10 = 26%(upper bound 98%)
![Page 27: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/27.jpg)
Collaborative filtering
Precision@10 = 13%(upper bound 28%)
Recall@10 = 49%(upper bound 98%)
![Page 28: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/28.jpg)
Category–Section counts
Precision@10 = 20%(upper bound 28%)
Recall@10 = 72%(upper bound 98%)
![Page 29: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/29.jpg)
Automatic evaluation has limitations!
The testing set contains articles with the problem we want to solve
few sections | inconsistent | different syntax
Human evaluation
Wikipedia editors Crowd-workers
![Page 30: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/30.jpg)
![Page 31: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/31.jpg)
Category–Section counts
Wikipedia editors:Precision@10 = 72%
Crowd-workers: Precision@10 = 81%
![Page 32: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/32.jpg)
● Introduced the section recommendation problem● Explored several methods using
○ features derived from the raw input article○ Wikipedia’s category network
● Learned that category network is key in offering useful recommendations
● We developed a methodology to prune the category network
https://github.com/epfl-dlab/structuring-wikipedia-articles
https://meta.wikimedia.org/wiki/Recommendation_API
![Page 33: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,](https://reader035.vdocuments.us/reader035/viewer/2022070921/5fb9a791771ce93aa6454af1/html5/thumbnails/33.jpg)
@tizianopiccardi
https://github.com/epfl-dlab/structuring-wikipedia-articles
https://meta.wikimedia.org/wiki/Recommendation_API