predicting news popularity by mining online discussions

30
Predicting News Popularity by Mining Online Discussions Georgios Rizos, Symeon Papadopoulos and Yiannis Kompatsiaris Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI) SNOW/WWW 2016, April 12, 2016, Montréal, Québec, Canada.

Upload: symeon-papadopoulos

Post on 13-Jan-2017

465 views

Category:

Social Media


0 download

TRANSCRIPT

Page 1: Predicting News Popularity by Mining Online Discussions

Predicting News Popularity byMining Online DiscussionsGeorgios Rizos, Symeon Papadopoulos and Yiannis Kompatsiaris

Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)

SNOW/WWW 2016, April 12, 2016, Montréal, Québec, Canada.

Page 2: Predicting News Popularity by Mining Online Discussions

Popularity

Operational definition ( $$$)• Number of views• Number of comments• Number of users commenting• Number of shares• Number of thumbs up, thumbs down….• Bonus: Controversiality

#2

Page 3: Predicting News Popularity by Mining Online Discussions

#3

Page 4: Predicting News Popularity by Mining Online Discussions

Overview

#4

Online discussions Comment trees/user graphs Tree/Graph Features

Predictive model

Page 5: Predicting News Popularity by Mining Online Discussions

Comment Trees and User Graphs

#5

User XI blah blah….

User YBut bla bla….

User XNo!

User WAre you kiddin?

User ZWat?

X

X

XZY

W

Y

WZ

A

Comment Tree

User Graph

User A story poster

A

Page 6: Predicting News Popularity by Mining Online Discussions

#6

Sony Hack #1 VS Sony Hack #2

Page 7: Predicting News Popularity by Mining Online Discussions

#7

Page 8: Predicting News Popularity by Mining Online Discussions

#8

10 mins

2 2

Page 9: Predicting News Popularity by Mining Online Discussions

#9

30 mins

4 4

Page 10: Predicting News Popularity by Mining Online Discussions

#10

1 hour

7 13

Page 11: Predicting News Popularity by Mining Online Discussions

#11

1 ½ hour

57 10

Page 12: Predicting News Popularity by Mining Online Discussions

#12

14358

2 ¼ hours

Page 13: Predicting News Popularity by Mining Online Discussions

#13

DegreeDepth

Elements of an Engaging Discussion

Page 14: Predicting News Popularity by Mining Online Discussions

Related Work: Post Popularity

News story discussion size prediction:• Modelling the comment timestamp time-series (Tatar

et al., 2014)• Feature set relevant to time of posting (hour, day),

entities mentioned, etc. (Tsagkias et al., 2009)These do not leverage the graph structure!

Online forum thread analysis:• Prediction of thread overall quality. Also includes

rudimentary graph features (Lee et al. 2014)Only very simple comment tree features.

#14

Page 15: Predicting News Popularity by Mining Online Discussions

Related Work: Post Diffusion

Twitter hashtag popularity prediction:• Prediction based on adoption graph properties and

communities (Weng et al., 2014)• Prediction based on geolocation and adoption graph

conductance (Bora et al., 2015)

Facebook post share count prediction:• Prediction based on share graph, author and

temporal property features (Cheng et al., 2014)Setting different than online discussion mining.

#15

Page 16: Predicting News Popularity by Mining Online Discussions

Related Work: Discussion Mining

Uncovering other graph based qualities:• Comment tree h-index hypothesized to be a proxy for

discussion controversiality (Gomez et al., 2008)• A user-comment h-index variation hypothesized to

be a proxy for political discussion deliberation (Gonzalez-Bailon et al., 2010)

• Share graph Wiener index shown to be a proxy of quality/interestingness of the initial post via an SIS diffusion simulation (Goel et al., 2015)

Indices not applied for popularity prediction!

#16

Page 17: Predicting News Popularity by Mining Online Discussions

Comment Tree Features

#17

• Quantification of depth, width, bushy-ness, the existence of multiple long threads and branching complexity of comment tree structure.

Page 18: Predicting News Popularity by Mining Online Discussions

User Graph Features

#18

• Quantification of user recurrence and branching complexity of user graph structure.

Page 19: Predicting News Popularity by Mining Online Discussions

Temporal Features

#19

• Quantification of growth rate of the discussion using simple measures borrowed from (Cheng et al. 2014)

Page 20: Predicting News Popularity by Mining Online Discussions

Datasets

#20

• We used three datasets of news story posts and online discussions.

• RedditNews dataset: Sample of posts in news-based subreddits made in 2014 (thanks derp.institute!)

Page 21: Predicting News Popularity by Mining Online Discussions

Evaluation: Model building

• Random Forest regression to handle inhomogeneous features

• Prediction targets:– Comment count: all datasets– User count: eponymous users only; all datasets– Score: #upvotes - #downvotes; RedditNews only– Controversiality: #disagreements; RedditNews only

• Score and Controversiality are penalized for small number of votes

• Different models built for different timepoints in the discussion evolution corresponding to 1%-14% of the stories lifetime (1% ~ 10 minutes)

#21

Page 22: Predicting News Popularity by Mining Online Discussions

Evaluation: VS Simple Graph Features

#22

• The proposed graph features capture lead to better prediction compared to rudimentary graph features.

• Large improvement when target is controversiality.

Page 23: Predicting News Popularity by Mining Online Discussions

Evaluation: VS Simple Graph Features

#23

Page 24: Predicting News Popularity by Mining Online Discussions

Evaluation: Feature Type Comparison

#24

• Comment tree features good for comment prediction and user graph for user count.

• Integration of all feature types yields best results, except for controversiality where all_graph is best.

Page 25: Predicting News Popularity by Mining Online Discussions

Evaluation: Feature Type Comparison

#25

Page 26: Predicting News Popularity by Mining Online Discussions

Evaluation: Top-100 Controversial

#26

• Which stories will be the most controversial?• We report the Jaccard Coefficient (x100) between

true top-100 and the top-100 predicted by the prediction framework using the three feature sets.

Page 27: Predicting News Popularity by Mining Online Discussions

Conclusion

• Key contributions– Improved popularity prediction using lightweight graph-

based features– Post controversiality prediction showed significant

improvement

• Future Work– Leveraging other information modalities, such as text– Investigate dependence on the topic category of a story or

the type of the post (e.g., text post or multimedia).

#27

Page 28: Predicting News Popularity by Mining Online Discussions

Thank you!

• Resources:Code: https://github.com/MKLab-ITI/news-popularity-predictionOnline demo: http://reveal-mklab.iti.gr/reveal/popularity

• Get in touch:@sympap / [email protected]@georgios_rizos/ [email protected]

#28

Page 29: Predicting News Popularity by Mining Online Discussions

References (1/2)

• A. Tatar, P. Antoniadis, M. D. De Amorim, and S. Fdida. From popularity prediction to ranking online news. Social Network Analysis and Mining, 4(1):1–12, 2014.

• M. Tsagkias, W. Weerkamp, and M. De Rijke. Predicting the volume of comments on online news stories. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, pages 1765–1768. ACM, 2009.

• J. Lee, M. Yang, and H. Rim. Discovering high-Quality threaded discussions in online forums. Journal of Computer Science and Technology, 29(3):519–531, 2014.

• L. Weng, F. Menczer, and Y.-Y. Ahn. Predicting successfulmemes using network and community structure. arXivpreprint arXiv:1403.6199, 2014.

• S. Bora, H. Singh, A. Sen, A. Bagchi, and P. Singla. On therole of conductance, geography and topology in predictinghashtag virality. arXiv preprint arXiv:1504.05351, 2015.

#29

Page 30: Predicting News Popularity by Mining Online Discussions

References (2/2)

• J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, andJ. Leskovec. Can cascades be predicted? In Proceedings ofthe 23rd international conference on World Wide Web,pages 925–936, 2014.

• V. G´omez, A. Kaltenbrunner, and V. L´opez. Statistical analysis of the social network and discussion threads in slashdot. In Proceedings of the 17th intern. conference on World Wide Web, pages 645–654. ACM, 2008.

• S. Gonzalez-Bailon, A. Kaltenbrunner, and R. E. Banchs. The structure of political discussion networks: a model for the analysis of online deliberation. Journal of Information Technology, 25(2):230–243, 2010.

• S. Goel, A. Anderson, J. Hofman, and D.J. Watts. The structural virality of online diffusion. Management Science, 62(1): 180–196, 2015

#30