profiling vs. time vs. content: what does matter for top-k publication recommendation based on...

47
www.moving- project.eu TraininG towards a society of data-saVvy inforMation prOfessionals to enable open leadership INnovation Chifumi Nishioka and Ansgar Scherp Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles? Kiel University and Leibniz Information Centre for Economics (ZBW Kiel), Germany

Upload: moving-project

Post on 20-Jan-2017

172 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

TraininG towards a society of data-saVvy inforMation prOfessionalsto enable open leadership INnovation

Chifumi Nishioka and Ansgar Scherp

Profiling vs. Time vs. Content: What does Matter for Top-k Publication

Recommendation based on Twitter Profiles?

Kiel University and Leibniz Information Centre for Economics (ZBW Kiel), Germany

Page 2: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

2 of 21

Motivation• Information overload: too many papers in DLs• Recommender systems: facilitate researchers by

suggesting papers that may interest them• Collaborative filtering: cold start problem• Content-based: based on their papers

• Social media in academia• Researchers’ ideas on Twitter

[Letierce et al. 14]• Ongoing research interests

Chifumi Nishioka ([email protected])

Recommender System for scientific papers based on Twitter

Page 3: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

3 of 21

Three Factors

(I) Profiling method• Extract features from social media items and

documents• How to model user profiles and document profiles

(II) Temporal decay function• Model the assumption that older

items are less important• Examine which temporal decay function performs well

(III) Document content• Investigate whether it is possible to make reasonable

recommendations using only titles of documents

Chifumi Nishioka ([email protected])

Page 4: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

4 of 21

Recommendation Procedure

Chifumi Nishioka ([email protected])

User profiling

Compute similarity scoresbetween user profile and each of document profiles

Document profile

Recommend documents that have high similarity scores

Page 5: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

5 of 21

Related Factors

Chifumi Nishioka ([email protected])

User profiling

Compute similarity scoresbetween user profile and each of document profiles

Document profile

document corpus

Recommend documents that have high similarity scores

(I) Profilin

g method

(II) Temporal decay function

(I) Profilin

g method

(II) Temporal decay function

(III) Document content

Page 6: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

6 of 21

Three Factors and Choices

Chifumi Nishioka ([email protected])

• ConfigurationsFactor Design Choices

Profiling method CF-IDF HCF-IDF LDATemporal decay function Sliding window Exponential decay

Document content All (title + full-text) Title

strategies are experimented

Page 7: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

7 of 21

Profiles

• Profiles: represented as a vector, where each element is a weight of a concept (i.e., feature)

• User profile for user • Based on a user’s social media items (tweets)

• Document profile for document

• User profile and document profile are made by the same method

Chifumi Nishioka ([email protected])

𝑝𝑢={ h𝑤𝑒𝑖𝑔 𝑡 (𝑐 , 𝐼𝑢)∨∀𝑐∈𝐶 }

h𝑤𝑒𝑖𝑔 𝑡 (𝑐 , 𝐼𝑢)=∑𝑖∈ 𝐼𝑢

h𝑤𝑒𝑖𝑔 𝑡 (𝑐 ,𝑖)

𝑝𝑑={ h𝑤𝑒𝑖𝑔 𝑡 (𝑐 ,𝑑 )∨∀𝑐∈𝐶 }

a concept (i.e., feature) is a subject term or topic

Page 8: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

8 of 21

Factor I: Profiling Methods

• CF-IDF [Goossen et al. 11]

• Extension of TF-IDF replacing words with concepts• Concept: a subject term coming from a taxonomy

• e.g., financial crisis, interest rate (economics)• Use a domain specific taxonomy to extract only concepts

that are relevant to the target domain

Chifumi Nishioka ([email protected])

How to extract features and model users and documents

h𝑤𝑒𝑖𝑔 𝑡 ′𝑐𝑓𝑖𝑑𝑓 (𝑐 ,𝑑 )=𝑐𝑓 (𝑐 ,𝑑) ∙ 𝑙𝑜𝑔 ¿𝐷∨ ¿¿𝑑∈𝐷:𝑐∈𝑑∨¿¿

¿

1. CF-IDFKB (taxonomy) based methods2. HCF-IDF

3. LDA Topic modelingFreely available in many domains!

Page 9: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

9 of 21

Factor I: Profiling Methods

Chifumi Nishioka ([email protected])

SocialRecommendation

SocialTagging

Web Searching Web Mining

SiteWrapping

Web LogAnalysis

World Wide Web

 

 

 

• HCF-IDF (Hierarchical CF-IDF) [Nishioka et al. 15]• Extension of CF-IDF using spreading activation• Combine the strength of the semantics with the

statistical strength

• : best spreadingactivation function

h𝑤𝑒𝑖𝑔 𝑡 ′h𝑐𝑓𝑖𝑑𝑓 (𝑐 ,𝑑 )=𝐵𝑒𝑙𝑙𝐿𝑜𝑔(𝑐 ,𝑑) ∙𝑙𝑜𝑔¿𝐷∨ ¿¿𝑑∈𝐷 :𝑐∈𝑑∨¿¿

¿

Extract concepts which are not mentioned directly by spreading activation!

Page 10: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

10 of 21

Factor I: Profiling Methods

• Latent Dirichlet Allocation (LDA) [Blei et al. 03]• Unsupervised topic modeling method• Topic model

• Document: A probability distribution over topics• Topic: A probability distribution over words

• Treat a topic as a concept

• Procedure• Construct a topic model over document corpus• Infer a topic distribution over the trained topic model in

social media items

Chifumi Nishioka ([email protected])

h𝑤𝑒𝑖𝑔 𝑡 ′𝑙𝑑𝑎 (𝑐 ,𝑑 )=𝑝 (𝑐∨𝑑)

Page 11: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

11 of 21

Factor II: Temporal Decay Function

• Final weight is given after applying decay

• Sliding window

• Give weights only concepts that appear after • Parameter setting

• for social media items• for scientific papers

• Exponential decay• Parameter setting

• for social media items• for scientific publications

Chifumi Nishioka ([email protected])

𝑓 𝑠𝑤 (𝑡 )={1 for 𝑡≥ h h𝑡 𝑟𝑒𝑠0 for 𝑡< h h𝑡 𝑟𝑒𝑠

𝑓 𝑒𝑥𝑝 (𝑡 )=𝑒−(𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡− 𝑡 )/𝜏

h𝑤𝑒𝑖𝑔 𝑡 (𝑐 , 𝑖 )= 𝑓 (𝑡 ) ∙ h𝑤𝑒𝑖𝑔 𝑡 ′ (𝑐 ,𝑖 ) h𝑤𝑒𝑖𝑔 𝑡 (𝑐 , 𝑖 )= 𝑓 (𝑡 ) ∙ h𝑤𝑒𝑖𝑔 𝑡 ′ (𝑐 ,𝑖 )

Page 12: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

12 of 21

Factor III: Document Content

• Title• Always freely available

• All (title + full-text)• Full-text: usually not available due to legal issues• Extract full-text from PDF files

Chifumi Nishioka ([email protected])

How the recommendation performance using only titles is close to using both title and full-text

Page 13: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

13 of 21

Computing Recommendations

• Temporal Cosine Similarity for CF-IDF & HCF-IDF

• to give higher weights to newer documents• Dot Product for LDA

• Better performance than cosine similarity and Kullback-Leibler divergence [Hazen 10]

Chifumi Nishioka ([email protected])

𝑠𝑖𝑚𝑡𝑐𝑜𝑠 (𝑝𝑢 ,𝑝𝑑)= 𝑓 (𝑡𝑑) ∙𝑝𝑢 ∙𝑝𝑑

¿∨𝑝𝑢∨¿ ∙∨|⃗𝑝𝑑|∨¿¿

𝑠𝑖𝑚𝑑𝑝 (𝑝𝑢 ,𝑝𝑑)=𝑝𝑢 ∙𝑝𝑑

Recommend documents that have higher similarity scores with user profile

Page 14: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

14 of 21

Experiment Setup (1/3)

• Procedure• Input his/her public Twitter handles• The number of recommendations per strategy [Chen

et al. 10]• Ask participants to assess whether a recommended

paper is interesting or not• On average 517.54 seconds to complete the

experiment

Chifumi Nishioka ([email protected])

Scenario: Recommend scientific papers in the field of economics based on users’ tweets

Page 15: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

15 of 21

Experiment Setup (2/3)

• Web application

• Metadata (i.e., author, title, year) is shown• Participants can open PDF files by clicking metadata• Order of strategies is randomized• Order of recommended papers is randomized

Chifumi Nishioka ([email protected])

Page 16: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

16 of 21

Experiment Setup (3/3)

• Dataset• Scientific Papers

• 279,381 open access papers from EconBiz• EconBiz: a portal for scientific papers in economics managed

by ZBW, the German National Library of Economics• Hierarchical taxonomy

• STW, a thesaurus specialized for economics• 3,335 semantic concepts and 37,733 labels

• 123 participants from the field of economics• 21 bachelor /58 master /32 PhD /12 professor

• Metric: rankscore [Breese et al. 98]• Assumption: higher ranked items are more likely to be

viewedChifumi Nishioka ([email protected])

http://www.econbiz.de/

Page 17: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

17 of 21

Result: Recommendation Strategy

• The strategy CF-IDF × Sliding window × All performs best with the rankscore of

• But, the statistical test shows no significant difference between the best strategy and the strategies using HCF-IDF

Chifumi Nishioka ([email protected])

Page 18: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

18 of 21

Result: Influence of Three Factors

• Three-way repeated-measure ANOVA to analyze the performance with respect to each factor

• Profiling method has the largest impact on the recommendation performance

• Best profiling method: HCF-IDFChifumi Nishioka ([email protected])

Page 19: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

19 of 21

Insights from the Results

• Profiling method has the largest impact• Best profiling method: HCF-IDF

• Spreading activation mitigates sparseness• Works for users who have less tweets

• Usually, full-texts are unavailable for TDM• Easy to employ HCF-IDF in many different fields

• MeSH for medicine, ACM CCS for computer science

Chifumi Nishioka ([email protected])

Advantage: HCF-IDF can make good recommendations based on only titles

Page 20: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

20 of 21

Conclusion

• User experiment with three factors• Profiling method: CF-IDF / HCF-IDF / LDA• Decay function: sliding window / exponential decay• Document content: Title / All (title + full-text)

• Result• Profiling method has the largest impact• Best profiling method: HCF-IDF• Advantage of HCF-IDF: It performs well even if only

titles are available

Chifumi Nishioka ([email protected])

Recommender system for scientific papers based on social media items

Page 21: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

21 of 21

Special thanks to SIGIR Student Travel Grant

Project consortium and funding agency

Chifumi Nishioka ([email protected])

MOVING is funded by the EU Horizon 2020 Programme under the project number INSO-4-2015: 693092

Our demo is online!http://amygdala.informatik.uni-kiel.de/Demo/TwitterAccount

Page 22: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

22 of 21

Appendix

Chifumi Nishioka ([email protected])

Page 23: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

23 of 21

Reference• [Blei and Lafferty 06] D. M. Blei and J. D. Lafferty. Dynamic topic models. ICML, 2006.• [Blei et al. 03] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR,

2003.• [Breese et al. 98] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of

predictive algorithms for collaborative filtering. UAI, 1998.• [Goossen et al. 11] F. Goossen, W. IJntema, F. Frasincar, F. Hogenboom, and U. Kaymak.

News personalization using the CF-IDF semantic recommender. WIMS, 2011.• [Griffiths and Steyvers 04] T. L. Griffiths and M. Steyvers. Finding scientific topics. NAS,

2004. • [Hazen 10] T. J. Hazen. Direct and latent modeling techniques for computing spoken

document similarity. Spoken Language Technology, 2010.• [Kapanipathi et al. 14] P. Kapanipathi, P. Jain, C. Venkataramani, and A. Sheth. User

interests identification on Twitter using a hierarchical knowledge base. ESWC, 2014.• [Letierce et al. 10] J. Letierce, A. Passant, J. Breslin, and S. Decker. Understanding how

Twitter is used to spread scientific messages. WebSci, 2010.• [Nascimento et al. 11] C. Nascimento, A. H. F. Laender, A. S. da Silva, and M. A.

Gonçalves. A source independent framework for research paper recommendation. JCDL, 2011.

Chifumi Nishioka ([email protected])

Page 24: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

24 of 21

Reference• [Nishioka et al. 15] C. Nishioka, G. Große-Bölting, A. Scherp. Influence of time on user

profiling and recommending researchers in social media. i-KNOW, 2015. • [Shen et al. 13] W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets

with knowledge base via user interest modeling. KDD, 2013• [Sugiyama and Kan 10] K. Sugiyama and M.-Y. Kan. Scholarly paper recommendation via

user’s recent research interests. JCDL, 2010.

Chifumi Nishioka ([email protected])

Page 25: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

25 of 21

Evaluation Metrics

• rankscore [Breese et al. 98]• Posit that each successive item in a list is less likely to

be viewed with an exponential decay

• Set a parameter , along with [Breese et al. 98]• Other metrics

• Precision• Mean Average Precision (MAP)• Mean Reciprocal Rank (MRR)• normalized Discounter Cumulative Gain (nDCG)

Chifumi Nishioka ([email protected])

𝑟𝑎𝑛𝑘𝑠𝑐𝑜𝑟𝑒= ∑𝑑∈h𝑖𝑡𝑠

1

2𝑟𝑎𝑛𝑘𝑑−1𝜃−1

h𝑤𝑒𝑖𝑔 𝑡 (𝑐 , 𝑖 )

Page 26: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

26 of 21

Participants

• Collecting participants• Mailing lists, tweets, and word-of-month• 134 started the experiment and 123 completed

• Demographics of participants• 96 male / 27 female• 32.83 years old on average (SD: 7.34)• 21 bachelor / 58 master / 32 a PhD / 12 professor• 83 working in academia / 40 working in industry

• Incentive for the participation• Get to know his / her most similar economist among

26 famous economists• Chance to get one of two Amazon vouchers (50 EUR)

Chifumi Nishioka ([email protected])

Page 27: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

27 of 21

Tweets of Participants

• Extract user’s tweets via Twitter API• Enable to extract at most 3,200 tweets per user

• Collect only tweets in English• The number of English tweets per participant

• Average: 1096.82 English tweets (SD: 1048.46)• Max: 3192• Min: 2

• Criteria for the participation• Participants who had no tweet in the last 250 days

could not participate in the experiment• Five were rejected for this reason

Chifumi Nishioka ([email protected])

Page 28: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

28 of 21

Dataset: Scientific Publications

• Result of collaboration with EconBiz• 279,381 papers in English• EconBiz: a portal for scientific publications om

economics• Procedure

• Seed list: 1 million URLs of open access papers• Successfully download 413,098 papers in PDF• Convert PDFs into texts using Apache PDFBox• Detect languages of 413,098 papers• Finally, get 279,381 papers in English

Chifumi Nishioka ([email protected])

Page 29: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

29 of 21

Dataset: Taxonomy

• STW (ver. 8.12) enriched by DBpedia redirects• Maintained by ZBW• Specialized for economics• 6,335 semantic concepts and 11,679 labels in English

• Enrichment process• Goal: get more synonymous labels• Use the official mapping that connects STW concepts

with DBpedia concepts• Get redirects from Dbpedia concepts

• e.g., “Telecommunications Operator” and “Telephone companies”

• Finally, 6,335 semantic concepts and 37,733 labels

Chifumi Nishioka ([email protected])

Page 30: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

30 of 21

Latent Dirichlet Allocation (LDA)

Chifumi Nishioka ([email protected])

source: D. M. Blei. Probabilistic topic models, CACM, 2012.

Page 31: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

31 of 21

Latent Dirichlet Allocation (LDA)

• Implementation: JGibbLDA• Preprocessing

• Lemmatization and stop words removal• Remove words that appear in fewer than 25 scientific

publications along with [Blei and Lafferty 06]• Parameters

• Hyper-parameters: and • suggested by [Griffiths and Steyvers 04]

• The number of topics: • Optimized by the log likelihood• Experimented

• The number of iterations:

Chifumi Nishioka ([email protected])

Page 32: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

32 of 21

Statistical Test for Strategies (1/2)

• Mauchly’s sphericity test• Verify if the variances of the rankscores of the twelve

strategies are equal• Reveal a violation of sphericity in the strategies (, ),

which leads to positively biased F-statistics and increases false positives

• One-way repeated-measure ANOVA with a Greenhouse-Geisser correction of • Reveals a significant difference (, )

Chifumi Nishioka ([email protected])

Investigate the difference of the recommendation performance among the twelve strategies

Page 33: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

33 of 21

Statistical Test for Strategies (2/2)

• Shaffer’s modified sequentially rejective Bonferroni procedure (Shaffer’s MSRB procedure)

Chifumi Nishioka ([email protected])

Page 34: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

34 of 21

Statistical Test for Three Factors

• Mendoza’s sphericity test• Adopt to multi-way repeated-measure ANOVA• Again, reveal a violation of sphericity

• Three-way repeated-measure ANOVA with a Greenhouse-Geisser corrections

Chifumi Nishioka ([email protected])

Investigate the difference of the recommendation performance among the twelve strategies

Global ,

Profiling Method ,

Profiling Method × Decay Function ,

Profiling Method × Document Content (,

Page 35: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

35 of 21

Result: Precision

• Values are similar to ones in rankscore• The order of the strategies are identical

Chifumi Nishioka ([email protected])

Page 36: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

36 of 21

Result: nDCG

• Values are similar to ones in rankscore• The order of the strategies are identical

Chifumi Nishioka ([email protected])

Page 37: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

37 of 21

Result: Mean Average Precision

• Values are higher than ones of rankscore• The order of the strategies are almost same

Chifumi Nishioka ([email protected])

Page 38: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

38 of 21

Result: Mean Reciprocal Rank

• Values are higher than ones of rankscore• The order of the strategies are almost same

Chifumi Nishioka ([email protected])

Page 39: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

39 of 21

Result: Post-hoc Analysis

Chifumi Nishioka ([email protected])

Page 40: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

40 of 21

Result: Influence of Three Factors

• Profiling Method• This factor has the biggest impact• HCF-IDF is the best profiling method, followed by CF-

IDF and LDA• Document Content

• All (both title and full-text) significantly performs better than Title except when using HCF-IDF

• Profiling Method × Document Content• All (both title and full-text) is better choice for CF-IDF• Document Content makes no difference for HCF-IDF

Chifumi Nishioka ([email protected])

Page 41: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

41 of 21

Result: Demographic Factors (1/3)

• Gender• Female participants are more likely to evaluate

recommendations as interesting• However, no difference about how each strategy

performs compared to the others, due to no significant difference in the factor gender × strategy

• Age• No influence on recommendation performance

Chifumi Nishioka ([email protected])

Investigate whether each demographic factor has an influence on recommendation performance

Page 42: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

42 of 21

Result: Demographic Factors (2/3)

• Highest academic degree• Participants with bachelor are more likely to evaluate

recommendations as interesting than those with lecturer/professor

• But, no difference about how each strategy performs compared to the others

• Major• Manually classify participants into the two groups

• Participants majoring in economics () and other• No influence on recommendation performance

Chifumi Nishioka ([email protected])

Page 43: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

43 of 21

Result: Demographic Factors (3/3)

• Years of profession• On average, working in years (SD: )• Divide participants into the three groups

• Participants working for years (), working for years (), and working for years

• No influence on recommendation performance• Employment type

• Participants working in academia () and industry• No influence on recommendation performance

Chifumi Nishioka ([email protected])

Page 44: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

44 of 21

Result: Click Rates (1/2)

Chifumi Nishioka ([email protected])

Page 45: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

45 of 21

Result: Click Rates (2/2)

Chifumi Nishioka ([email protected])

Page 46: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

46 of 21

List of Taxonomies

• Maintained by W3• https://www.w3.org/2001/sw/wiki/SKOS/Datasets

Chifumi Nishioka ([email protected])

source: https://www.w3.org/2001/sw/wiki/SKOS/Datasets

Taxonomy DomainThesaurus for the Social Sciences Social scienceNASA Taxonomy Technology areasLinked Life data BiomedicineMedical Subject Headings (MeSH) Biomedicine

Australian education vocablaries EducationUNESCO Thesaurus Education, culture, natural sciences, and

social and human sciences

ACM Computing Classification System Computer science

Page 47: Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?

www.moving-project.eu

47 of 21

Insights from the Results

• Profiling method has the largest impact• Best profiling method: HCF-IDF• Only HCF-IDF enables to make reasonable

recommendations based on only titles• CF-IDF requires full-texts• LDA perform poorly even with full-texts

• Possible reason: impossible to infer topic distribution from social media items, which are short and sparse

• Usually, full-texts are not available for legal issues• Easy to employ HCF-IDF in many different fields

• e.g., MeSH for medicine, ACM CCS for compute science

Chifumi Nishioka ([email protected])