profiling vs. time vs. content: what does matter for top-k publication recommendation based on...

www.moving-project.eu

TraininG towards a society of data-saVvy inforMation prOfessionalsto enable open leadership INnovation

Chifumi Nishioka and Ansgar Scherp

Profiling vs. Time vs. Content: What does Matter for Top-k Publication

Recommendation based on Twitter Profiles?

Kiel University and Leibniz Information Centre for Economics (ZBW Kiel), Germany


2 of 21

Motivation• Information overload: too many papers in DLs• Recommender systems: facilitate researchers by

suggesting papers that may interest them• Collaborative filtering: cold start problem• Content-based: based on their papers

• Social media in academia• Researchers’ ideas on Twitter

[Letierce et al. 14]• Ongoing research interests

Chifumi Nishioka ([email protected])

Recommender System for scientific papers based on Twitter


3 of 21

Three Factors

(I) Profiling method• Extract features from social media items and

documents• How to model user profiles and document profiles

(II) Temporal decay function• Model the assumption that older

items are less important• Examine which temporal decay function performs well

(III) Document content• Investigate whether it is possible to make reasonable

recommendations using only titles of documents



4 of 21

Recommendation Procedure


User profiling

Compute similarity scoresbetween user profile and each of document profiles

Document profile

Recommend documents that have high similarity scores


5 of 21

Related Factors


User profiling

Compute similarity scoresbetween user profile and each of document profiles

Document profile

document corpus

Recommend documents that have high similarity scores

(I) Profilin

g method

(II) Temporal decay function

(I) Profilin

g method

(II) Temporal decay function

(III) Document content


6 of 21

Three Factors and Choices


• ConfigurationsFactor Design Choices

Profiling method CF-IDF HCF-IDF LDATemporal decay function Sliding window Exponential decay

Document content All (title + full-text) Title

strategies are experimented


7 of 21

Profiles

• Profiles: represented as a vector, where each element is a weight of a concept (i.e., feature)

• User profile for user • Based on a user’s social media items (tweets)

• Document profile for document

• User profile and document profile are made by the same method


𝑝𝑢={ h𝑤𝑒𝑖𝑔 𝑡 (𝑐 , 𝐼𝑢)∨∀𝑐∈𝐶 }

h𝑤𝑒𝑖𝑔 𝑡 (𝑐 , 𝐼𝑢)=∑𝑖∈ 𝐼𝑢

h𝑤𝑒𝑖𝑔 𝑡 (𝑐 ,𝑖)

𝑝𝑑={ h𝑤𝑒𝑖𝑔 𝑡 (𝑐 ,𝑑 )∨∀𝑐∈𝐶 }

a concept (i.e., feature) is a subject term or topic


8 of 21

Factor I: Profiling Methods

• CF-IDF [Goossen et al. 11]

• Extension of TF-IDF replacing words with concepts• Concept: a subject term coming from a taxonomy

• e.g., financial crisis, interest rate (economics)• Use a domain specific taxonomy to extract only concepts

that are relevant to the target domain


How to extract features and model users and documents

h𝑤𝑒𝑖𝑔 𝑡 ′𝑐𝑓𝑖𝑑𝑓 (𝑐 ,𝑑 )=𝑐𝑓 (𝑐 ,𝑑) ∙ 𝑙𝑜𝑔 ¿𝐷∨ ¿¿𝑑∈𝐷:𝑐∈𝑑∨¿¿

¿

1. CF-IDFKB (taxonomy) based methods2. HCF-IDF

3. LDA Topic modelingFreely available in many domains!


9 of 21



SocialRecommendation

SocialTagging

Web Searching Web Mining

SiteWrapping

Web LogAnalysis

World Wide Web

• HCF-IDF (Hierarchical CF-IDF) [Nishioka et al. 15]• Extension of CF-IDF using spreading activation• Combine the strength of the semantics with the

statistical strength

• : best spreadingactivation function

h𝑤𝑒𝑖𝑔 𝑡 ′h𝑐𝑓𝑖𝑑𝑓 (𝑐 ,𝑑 )=𝐵𝑒𝑙𝑙𝐿𝑜𝑔(𝑐 ,𝑑) ∙𝑙𝑜𝑔¿𝐷∨ ¿¿𝑑∈𝐷 :𝑐∈𝑑∨¿¿

¿

Extract concepts which are not mentioned directly by spreading activation!


10 of 21


• Latent Dirichlet Allocation (LDA) [Blei et al. 03]• Unsupervised topic modeling method• Topic model

• Document: A probability distribution over topics• Topic: A probability distribution over words

• Treat a topic as a concept

• Procedure• Construct a topic model over document corpus• Infer a topic distribution over the trained topic model in

social media items


h𝑤𝑒𝑖𝑔 𝑡 ′𝑙𝑑𝑎 (𝑐 ,𝑑 )=𝑝 (𝑐∨𝑑)


11 of 21

Factor II: Temporal Decay Function

• Final weight is given after applying decay

• Sliding window

• Give weights only concepts that appear after • Parameter setting

• for social media items• for scientific papers

• Exponential decay• Parameter setting

• for social media items• for scientific publications


𝑓 𝑠𝑤 (𝑡 )={1 for 𝑡≥ h h𝑡 𝑟𝑒𝑠0 for 𝑡< h h𝑡 𝑟𝑒𝑠

𝑓 𝑒𝑥𝑝 (𝑡 )=𝑒−(𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡− 𝑡 )/𝜏

h𝑤𝑒𝑖𝑔 𝑡 (𝑐 , 𝑖 )= 𝑓 (𝑡 ) ∙ h𝑤𝑒𝑖𝑔 𝑡 ′ (𝑐 ,𝑖 ) h𝑤𝑒𝑖𝑔 𝑡 (𝑐 , 𝑖 )= 𝑓 (𝑡 ) ∙ h𝑤𝑒𝑖𝑔 𝑡 ′ (𝑐 ,𝑖 )


12 of 21

Factor III: Document Content

• Title• Always freely available

• All (title + full-text)• Full-text: usually not available due to legal issues• Extract full-text from PDF files


How the recommendation performance using only titles is close to using both title and full-text


13 of 21

Computing Recommendations

• Temporal Cosine Similarity for CF-IDF & HCF-IDF

• to give higher weights to newer documents• Dot Product for LDA

• Better performance than cosine similarity and Kullback-Leibler divergence [Hazen 10]


𝑠𝑖𝑚𝑡𝑐𝑜𝑠 (𝑝𝑢 ,𝑝𝑑)= 𝑓 (𝑡𝑑) ∙𝑝𝑢 ∙𝑝𝑑

¿∨𝑝𝑢∨¿ ∙∨|⃗𝑝𝑑|∨¿¿

𝑠𝑖𝑚𝑑𝑝 (𝑝𝑢 ,𝑝𝑑)=𝑝𝑢 ∙𝑝𝑑

Recommend documents that have higher similarity scores with user profile


14 of 21

Experiment Setup (1/3)

• Procedure• Input his/her public Twitter handles• The number of recommendations per strategy [Chen

et al. 10]• Ask participants to assess whether a recommended

paper is interesting or not• On average 517.54 seconds to complete the

experiment


Scenario: Recommend scientific papers in the field of economics based on users’ tweets


15 of 21


• Web application

• Metadata (i.e., author, title, year) is shown• Participants can open PDF files by clicking metadata• Order of strategies is randomized• Order of recommended papers is randomized



16 of 21


• Dataset• Scientific Papers

• 279,381 open access papers from EconBiz• EconBiz: a portal for scientific papers in economics managed

by ZBW, the German National Library of Economics• Hierarchical taxonomy

• STW, a thesaurus specialized for economics• 3,335 semantic concepts and 37,733 labels

• 123 participants from the field of economics• 21 bachelor /58 master /32 PhD /12 professor

• Metric: rankscore [Breese et al. 98]• Assumption: higher ranked items are more likely to be

viewedChifumi Nishioka ([email protected])

http://www.econbiz.de/


17 of 21

Result: Recommendation Strategy

• The strategy CF-IDF × Sliding window × All performs best with the rankscore of

• But, the statistical test shows no significant difference between the best strategy and the strategies using HCF-IDF



18 of 21

Result: Influence of Three Factors

• Three-way repeated-measure ANOVA to analyze the performance with respect to each factor

• Profiling method has the largest impact on the recommendation performance

• Best profiling method: HCF-IDFChifumi Nishioka ([email protected])


19 of 21

Insights from the Results

• Profiling method has the largest impact• Best profiling method: HCF-IDF

• Spreading activation mitigates sparseness• Works for users who have less tweets

• Usually, full-texts are unavailable for TDM• Easy to employ HCF-IDF in many different fields

• MeSH for medicine, ACM CCS for computer science


Advantage: HCF-IDF can make good recommendations based on only titles


20 of 21

Conclusion

• User experiment with three factors• Profiling method: CF-IDF / HCF-IDF / LDA• Decay function: sliding window / exponential decay• Document content: Title / All (title + full-text)

• Result• Profiling method has the largest impact• Best profiling method: HCF-IDF• Advantage of HCF-IDF: It performs well even if only

titles are available


Recommender system for scientific papers based on social media items


21 of 21

Special thanks to SIGIR Student Travel Grant

Project consortium and funding agency


MOVING is funded by the EU Horizon 2020 Programme under the project number INSO-4-2015: 693092

Our demo is online!http://amygdala.informatik.uni-kiel.de/Demo/TwitterAccount


22 of 21

Appendix



23 of 21

Reference• [Blei and Lafferty 06] D. M. Blei and J. D. Lafferty. Dynamic topic models. ICML, 2006.• [Blei et al. 03] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR,

2003.• [Breese et al. 98] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of

predictive algorithms for collaborative filtering. UAI, 1998.• [Goossen et al. 11] F. Goossen, W. IJntema, F. Frasincar, F. Hogenboom, and U. Kaymak.

News personalization using the CF-IDF semantic recommender. WIMS, 2011.• [Griffiths and Steyvers 04] T. L. Griffiths and M. Steyvers. Finding scientific topics. NAS,

2004. • [Hazen 10] T. J. Hazen. Direct and latent modeling techniques for computing spoken

document similarity. Spoken Language Technology, 2010.• [Kapanipathi et al. 14] P. Kapanipathi, P. Jain, C. Venkataramani, and A. Sheth. User

interests identification on Twitter using a hierarchical knowledge base. ESWC, 2014.• [Letierce et al. 10] J. Letierce, A. Passant, J. Breslin, and S. Decker. Understanding how

Twitter is used to spread scientific messages. WebSci, 2010.• [Nascimento et al. 11] C. Nascimento, A. H. F. Laender, A. S. da Silva, and M. A.

Gonçalves. A source independent framework for research paper recommendation. JCDL, 2011.



24 of 21

Reference• [Nishioka et al. 15] C. Nishioka, G. Große-Bölting, A. Scherp. Influence of time on user

profiling and recommending researchers in social media. i-KNOW, 2015. • [Shen et al. 13] W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets

with knowledge base via user interest modeling. KDD, 2013• [Sugiyama and Kan 10] K. Sugiyama and M.-Y. Kan. Scholarly paper recommendation via

user’s recent research interests. JCDL, 2010.



25 of 21

Evaluation Metrics

• rankscore [Breese et al. 98]• Posit that each successive item in a list is less likely to

be viewed with an exponential decay

• Set a parameter , along with [Breese et al. 98]• Other metrics

• Precision• Mean Average Precision (MAP)• Mean Reciprocal Rank (MRR)• normalized Discounter Cumulative Gain (nDCG)


𝑟𝑎𝑛𝑘𝑠𝑐𝑜𝑟𝑒= ∑𝑑∈h𝑖𝑡𝑠

1

2𝑟𝑎𝑛𝑘𝑑−1𝜃−1

h𝑤𝑒𝑖𝑔 𝑡 (𝑐 , 𝑖 )


26 of 21

Participants

• Collecting participants• Mailing lists, tweets, and word-of-month• 134 started the experiment and 123 completed

• Demographics of participants• 96 male / 27 female• 32.83 years old on average (SD: 7.34)• 21 bachelor / 58 master / 32 a PhD / 12 professor• 83 working in academia / 40 working in industry

• Incentive for the participation• Get to know his / her most similar economist among

26 famous economists• Chance to get one of two Amazon vouchers (50 EUR)



27 of 21

Tweets of Participants

• Extract user’s tweets via Twitter API• Enable to extract at most 3,200 tweets per user

• Collect only tweets in English• The number of English tweets per participant

• Average: 1096.82 English tweets (SD: 1048.46)• Max: 3192• Min: 2

• Criteria for the participation• Participants who had no tweet in the last 250 days

could not participate in the experiment• Five were rejected for this reason



28 of 21

Dataset: Scientific Publications

• Result of collaboration with EconBiz• 279,381 papers in English• EconBiz: a portal for scientific publications om

economics• Procedure

• Seed list: 1 million URLs of open access papers• Successfully download 413,098 papers in PDF• Convert PDFs into texts using Apache PDFBox• Detect languages of 413,098 papers• Finally, get 279,381 papers in English



29 of 21

Dataset: Taxonomy

• STW (ver. 8.12) enriched by DBpedia redirects• Maintained by ZBW• Specialized for economics• 6,335 semantic concepts and 11,679 labels in English

• Enrichment process• Goal: get more synonymous labels• Use the official mapping that connects STW concepts

with DBpedia concepts• Get redirects from Dbpedia concepts

• e.g., “Telecommunications Operator” and “Telephone companies”

• Finally, 6,335 semantic concepts and 37,733 labels



30 of 21

Latent Dirichlet Allocation (LDA)


source: D. M. Blei. Probabilistic topic models, CACM, 2012.


31 of 21

Latent Dirichlet Allocation (LDA)

• Implementation: JGibbLDA• Preprocessing

• Lemmatization and stop words removal• Remove words that appear in fewer than 25 scientific

publications along with [Blei and Lafferty 06]• Parameters

• Hyper-parameters: and • suggested by [Griffiths and Steyvers 04]

• The number of topics: • Optimized by the log likelihood• Experimented

• The number of iterations:



32 of 21

Statistical Test for Strategies (1/2)

• Mauchly’s sphericity test• Verify if the variances of the rankscores of the twelve

strategies are equal• Reveal a violation of sphericity in the strategies (, ),

which leads to positively biased F-statistics and increases false positives

• One-way repeated-measure ANOVA with a Greenhouse-Geisser correction of • Reveals a significant difference (, )


Investigate the difference of the recommendation performance among the twelve strategies


33 of 21

Statistical Test for Strategies (2/2)

• Shaffer’s modified sequentially rejective Bonferroni procedure (Shaffer’s MSRB procedure)



34 of 21

Statistical Test for Three Factors

• Mendoza’s sphericity test• Adopt to multi-way repeated-measure ANOVA• Again, reveal a violation of sphericity

• Three-way repeated-measure ANOVA with a Greenhouse-Geisser corrections


Investigate the difference of the recommendation performance among the twelve strategies

Global ,

Profiling Method ,

Profiling Method × Decay Function ,

Profiling Method × Document Content (,


35 of 21

Result: Precision

• Values are similar to ones in rankscore• The order of the strategies are identical



36 of 21

Result: nDCG

• Values are similar to ones in rankscore• The order of the strategies are identical



37 of 21

Result: Mean Average Precision

• Values are higher than ones of rankscore• The order of the strategies are almost same



38 of 21

Result: Mean Reciprocal Rank

• Values are higher than ones of rankscore• The order of the strategies are almost same



39 of 21

Result: Post-hoc Analysis



40 of 21

Result: Influence of Three Factors

• Profiling Method• This factor has the biggest impact• HCF-IDF is the best profiling method, followed by CF-

IDF and LDA• Document Content

• All (both title and full-text) significantly performs better than Title except when using HCF-IDF

• Profiling Method × Document Content• All (both title and full-text) is better choice for CF-IDF• Document Content makes no difference for HCF-IDF



41 of 21

Result: Demographic Factors (1/3)

• Gender• Female participants are more likely to evaluate

recommendations as interesting• However, no difference about how each strategy

performs compared to the others, due to no significant difference in the factor gender × strategy

• Age• No influence on recommendation performance


Investigate whether each demographic factor has an influence on recommendation performance


42 of 21


• Highest academic degree• Participants with bachelor are more likely to evaluate

recommendations as interesting than those with lecturer/professor

• But, no difference about how each strategy performs compared to the others

• Major• Manually classify participants into the two groups

• Participants majoring in economics () and other• No influence on recommendation performance



43 of 21


• Years of profession• On average, working in years (SD: )• Divide participants into the three groups

• Participants working for years (), working for years (), and working for years

• No influence on recommendation performance• Employment type

• Participants working in academia () and industry• No influence on recommendation performance



44 of 21

Result: Click Rates (1/2)



45 of 21

Result: Click Rates (2/2)



46 of 21

List of Taxonomies

• Maintained by W3• https://www.w3.org/2001/sw/wiki/SKOS/Datasets


source: https://www.w3.org/2001/sw/wiki/SKOS/Datasets

Taxonomy DomainThesaurus for the Social Sciences Social scienceNASA Taxonomy Technology areasLinked Life data BiomedicineMedical Subject Headings (MeSH) Biomedicine

Australian education vocablaries EducationUNESCO Thesaurus Education, culture, natural sciences, and

social and human sciences

ACM Computing Classification System Computer science


47 of 21

Insights from the Results

• Profiling method has the largest impact• Best profiling method: HCF-IDF• Only HCF-IDF enables to make reasonable

recommendations based on only titles• CF-IDF requires full-texts• LDA perform poorly even with full-texts

• Possible reason: impossible to infer topic distribution from social media items, which are short and sparse

• Usually, full-texts are not available for legal issues• Easy to employ HCF-IDF in many different fields

• e.g., MeSH for medicine, ACM CCS for compute science