mendeley: crowdsourcing and recommending research on a large scale
Post on 28-May-2015
798 Views
Preview:
DESCRIPTION
TRANSCRIPT
Mendeley:crowdsourcing and
recommending researchon a large scale
Kris Jack, PhDData Mining Team Lead
➔ what is mendeley?
➔ crowdsourcing on a large scale
➔ recommendations on a large scale
➔ data for you
Summary
...a startup company
...going to change the way that we
do research...
Mendeley is...
...organise their research
...collaborate with one another
...discover new research
Mendeley provides tools to help users...
...organise their research
...organise their research
...collaborate with one another
...discover new research
Mendeley provides tools to help users...
...organise their research
...organise their research
...collaborate with one another
...discover new research
Mendeley provides tools to help users...
...organise their research
...organise their research
...collaborate with one another
...discover new research
Mendeley provides tools to help users...
...organise their research
...organise their research
...collaborate with one another
...discover new research
Mendeley provides tools to help users...
...organise their research
➔ what is mendeley?
➔ crowdsourcing on a large scale
➔ recommendations on a large scale
➔ data for you
SummarySummary
works like this:
1) Install “Audioscrobbler”
2) Listen to music
3) Last.fm builds your music profile and recommends you music you also could like
Last.fmMendeley
and it’s the world’slargest open musicdatabase!
Last.fmMendeley
research libraries
researchers
papers
disciplines
music libraries
artists
songs
genres
Screenshot taken from www.mendeley.com on 04/09/11
Mendeley is the world’slargest crowdsourced research catalogue!
assimilate research artefacts into catalogue in real time (pdfs + citation metadata)
recognise duplicate and non-duplicate artefacts in noisy input
Catalogue Crowdsourcing:System Requirements
articles
catalogue
catalogue generator
Main types of input:
→ article PDFs → article metadata (e.g. reference)
Main sources of input:
→ Mendeley Desktop → Mendeley Web Importer → External catalogue imports (e.g. ArXiv) → External catalogue lookups (e.g.
CrossRef)
articles
catalogue
catalogue generator
Aims:
→ Cluster documents together → Generate catalogue entries
articles
catalogue
catalogue generator
Process:
→ Filehash check (SHA-1) → Identifier check (e.g. PubMed id) → Document fingerprint (full text) → Metadata similarity check → Update individual article page
articles
catalogue
catalogue generator
Catalogue with:
→ article metadata → aggregated statistics → support recs, etc.
➔ what is mendeley?
➔ crowdsourcing on a large scale
➔ recommendations on a large scale
➔ what does this mean for you?
SummarySummary
generate personal article recommendations for users(i.e. “here are some articles that may interest you”)
update recommendations every 24 hours
Article Recommendation:System Requirements
Output:Recommend 10 articles to each user
Input:User libraries
Recommendation through collaborative filtering
Article's in library or not (e.g. binary input)
Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto)
16 months ago
Test:10-fold cross validation50,000 user libraries
Results:<0.025 precision at 10
Recommendation through collaborative filtering
Article's in library or not (e.g. binary input)
Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto)
Test:10-fold cross validation50,000 user libraries
10 months ago (i.e. + 6 months)
Results:~0.1 precision at 10
Recommendation through collaborative filtering
Article's in library or not (e.g. binary input)
Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto)
Test:Release to a subset of users
10 months ago (i.e. + 6 months)
Results:~0.4 precision at 10
Article Recommendation Acceptance RatesA
ccep
tan
ce r
ate
(i.e
. acc
ept/
reje
ct c
l ick
s)
Number of months live
generate personal article recommendations users(i.e. “here are some articles that may interest you”)
update recommendations every 24 hours
Article Recommendation:System Requirements
1 million users!
days!
How to scale up?
Test:10-fold cross validation50,000 user libraries
So, results comparable to non-distributed recommender
Completely distributed, so can easily run on EC2 within 24 hours...
Article Recommendation Precision Across User Library Sizes
Pre
cis i
on a
t 10
art
icle
s
Number of articles in user library
(using cooccurrence)
How will real users react?
➔ what is mendeley?
➔ crowdsourcing on a large scale
➔ recommendations on a large scale
➔ data for you
SummarySummary
Public Data
library readership library stars
Obtain from: http://dev.mendeley.com/datachallenge
user libraries
50,000 libraries4,848,724 articles
3,652,285 unique articles
Mendeley's API
www.mendeley.com
top related