mendeley: crowdsourcing and recommending research on a large scale

34
Mendeley: crowdsourcing and recommending research on a large scale Kris Jack, PhD Data Mining Team Lead

Upload: kris-jack

Post on 28-May-2015

798 views

Category:

Education


2 download

DESCRIPTION

I was invited to be the keynote speaker at a special track on Recommendation; Data Sharing and Research Practices in Science 2.0 at the I-KNOW 2011 conference (http://i-know.tugraz.at/) on 2011/09/07. It presents the challanges involved in crowdsourcing the world's largest research catalogue and then building a recommendation service on top of them that scales to serve millions of users.

TRANSCRIPT

Page 1: Mendeley: crowdsourcing and recommending research on a large scale

Mendeley:crowdsourcing and

recommending researchon a large scale

Kris Jack, PhDData Mining Team Lead

Page 2: Mendeley: crowdsourcing and recommending research on a large scale

➔ what is mendeley?

➔ crowdsourcing on a large scale

➔ recommendations on a large scale

➔ data for you

Summary

Page 3: Mendeley: crowdsourcing and recommending research on a large scale

...a startup company

...going to change the way that we

do research...

Mendeley is...

Page 4: Mendeley: crowdsourcing and recommending research on a large scale

...organise their research

...collaborate with one another

...discover new research

Mendeley provides tools to help users...

...organise their research

Page 5: Mendeley: crowdsourcing and recommending research on a large scale

...organise their research

...collaborate with one another

...discover new research

Mendeley provides tools to help users...

...organise their research

Page 6: Mendeley: crowdsourcing and recommending research on a large scale

...organise their research

...collaborate with one another

...discover new research

Mendeley provides tools to help users...

...organise their research

Page 7: Mendeley: crowdsourcing and recommending research on a large scale
Page 8: Mendeley: crowdsourcing and recommending research on a large scale

...organise their research

...collaborate with one another

...discover new research

Mendeley provides tools to help users...

...organise their research

Page 9: Mendeley: crowdsourcing and recommending research on a large scale

...organise their research

...collaborate with one another

...discover new research

Mendeley provides tools to help users...

...organise their research

Page 10: Mendeley: crowdsourcing and recommending research on a large scale

➔ what is mendeley?

➔ crowdsourcing on a large scale

➔ recommendations on a large scale

➔ data for you

SummarySummary

Page 11: Mendeley: crowdsourcing and recommending research on a large scale

works like this:

1) Install “Audioscrobbler”

2) Listen to music

3) Last.fm builds your music profile and recommends you music you also could like

Last.fmMendeley

and it’s the world’slargest open musicdatabase!

Page 12: Mendeley: crowdsourcing and recommending research on a large scale

Last.fmMendeley

research libraries

researchers

papers

disciplines

music libraries

artists

songs

genres

Screenshot taken from www.mendeley.com on 04/09/11

Mendeley is the world’slargest crowdsourced research catalogue!

Page 13: Mendeley: crowdsourcing and recommending research on a large scale

assimilate research artefacts into catalogue in real time (pdfs + citation metadata)

recognise duplicate and non-duplicate artefacts in noisy input

Catalogue Crowdsourcing:System Requirements

Page 14: Mendeley: crowdsourcing and recommending research on a large scale

articles

catalogue

catalogue generator

Main types of input:

→ article PDFs → article metadata (e.g. reference)

Main sources of input:

→ Mendeley Desktop → Mendeley Web Importer → External catalogue imports (e.g. ArXiv) → External catalogue lookups (e.g.

CrossRef)

Page 15: Mendeley: crowdsourcing and recommending research on a large scale

articles

catalogue

catalogue generator

Aims:

→ Cluster documents together → Generate catalogue entries

Page 16: Mendeley: crowdsourcing and recommending research on a large scale

articles

catalogue

catalogue generator

Process:

→ Filehash check (SHA-1) → Identifier check (e.g. PubMed id) → Document fingerprint (full text) → Metadata similarity check → Update individual article page

Page 17: Mendeley: crowdsourcing and recommending research on a large scale

articles

catalogue

catalogue generator

Catalogue with:

→ article metadata → aggregated statistics → support recs, etc.

Page 18: Mendeley: crowdsourcing and recommending research on a large scale

➔ what is mendeley?

➔ crowdsourcing on a large scale

➔ recommendations on a large scale

➔ what does this mean for you?

SummarySummary

Page 19: Mendeley: crowdsourcing and recommending research on a large scale

generate personal article recommendations for users(i.e. “here are some articles that may interest you”)

update recommendations every 24 hours

Article Recommendation:System Requirements

Page 20: Mendeley: crowdsourcing and recommending research on a large scale

Output:Recommend 10 articles to each user

Input:User libraries

Page 21: Mendeley: crowdsourcing and recommending research on a large scale

Recommendation through collaborative filtering

Article's in library or not (e.g. binary input)

Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto)

16 months ago

Test:10-fold cross validation50,000 user libraries

Results:<0.025 precision at 10

Page 22: Mendeley: crowdsourcing and recommending research on a large scale

Recommendation through collaborative filtering

Article's in library or not (e.g. binary input)

Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto)

Test:10-fold cross validation50,000 user libraries

10 months ago (i.e. + 6 months)

Results:~0.1 precision at 10

Page 23: Mendeley: crowdsourcing and recommending research on a large scale

Recommendation through collaborative filtering

Article's in library or not (e.g. binary input)

Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto)

Test:Release to a subset of users

10 months ago (i.e. + 6 months)

Results:~0.4 precision at 10

Page 24: Mendeley: crowdsourcing and recommending research on a large scale

Article Recommendation Acceptance RatesA

ccep

tan

ce r

ate

(i.e

. acc

ept/

reje

ct c

l ick

s)

Number of months live

Page 25: Mendeley: crowdsourcing and recommending research on a large scale

generate personal article recommendations users(i.e. “here are some articles that may interest you”)

update recommendations every 24 hours

Article Recommendation:System Requirements

1 million users!

days!

How to scale up?

Page 26: Mendeley: crowdsourcing and recommending research on a large scale
Page 27: Mendeley: crowdsourcing and recommending research on a large scale

Test:10-fold cross validation50,000 user libraries

So, results comparable to non-distributed recommender

Completely distributed, so can easily run on EC2 within 24 hours...

Page 28: Mendeley: crowdsourcing and recommending research on a large scale

Article Recommendation Precision Across User Library Sizes

Pre

cis i

on a

t 10

art

icle

s

Number of articles in user library

(using cooccurrence)

How will real users react?

Page 29: Mendeley: crowdsourcing and recommending research on a large scale

➔ what is mendeley?

➔ crowdsourcing on a large scale

➔ recommendations on a large scale

➔ data for you

SummarySummary

Page 30: Mendeley: crowdsourcing and recommending research on a large scale

Public Data

library readership library stars

Obtain from: http://dev.mendeley.com/datachallenge

user libraries

50,000 libraries4,848,724 articles

3,652,285 unique articles

Page 31: Mendeley: crowdsourcing and recommending research on a large scale

Mendeley's API

Page 32: Mendeley: crowdsourcing and recommending research on a large scale
Page 33: Mendeley: crowdsourcing and recommending research on a large scale
Page 34: Mendeley: crowdsourcing and recommending research on a large scale

www.mendeley.com