june 11, 2015 matthew bernhardt open repositories 2015 visualizing open access building a scalable...

44
June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

Upload: letitia-hancock

Post on 13-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Visualizing Open Accessbuilding a scalable infrastructure to

showcase the reach of MIT research

Page 2: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Background

Page 3: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Background

March 18, 2009 - Open Access Policy adopted“...The policy is to take effect immediately; it will be reviewed after five years by the Faculty Policy Committee, with a report presented to the Faculty.”

Page 4: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Background

March 18, 2009 - Open Access Policy adopted“...The policy is to take effect immediately; it will be reviewed after five years by the Faculty Policy Committee, with a report presented to the Faculty.”

2009 – 2013 MIT Libraries assemble a collection within Dspace@MIT for Open Access Articles.

Page 5: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Background

March 18, 2009 - Open Access Policy adopted“...The policy is to take effect immediately; it will be reviewed after five years by the Faculty Policy Committee, with a report presented to the Faculty.”

2009 – 2013 MIT Libraries assemble a collection within Dspace@MIT for Open Access Articles.

~10,000 articles, ~ 1.5 million downloads

Page 6: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Background

~10,000 articles, ~1.5 million downloads, but…

Author-level information?Department-level information?

Page 7: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Project

August 2013 - Project begins“Implement author-level, article-level, and aggregated article download usage statistics for articles in the Open Access Articles Collection in DSpace@MIT to incentivize deposits and provide useful assessment information for the MIT Faculty Open Access Policy.”

Page 8: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Prior Work

Page 9: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Prior Work

MyDASH provided solid model…• Map• Timeline• Summary table

Page 10: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Prior Work

MyDASH provided solid model…• Map• Timeline• Summary table

… but couldn’t be directly implemented.• Repository versus One Collection• Multiple department affiliations

Page 11: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Project Goals

• Make available download statistics at three levels: author, article, and aggregate

• Incentivize deposits to collection• Provide useful information for policy evaluation

Page 12: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Project Goals

• Make available download statistics at three levels: author, article, and aggregate

• Incentivize deposits to collection• Provide useful information for policy evaluation

• Evaluate new technologies within the Libraries (i.e. MongoDB)

Page 13: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Not Project Goals

• Integration with altmetrics systems• COUNTER

Page 14: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Pipeline

Page 15: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Three-part project

Data processing pipelinehttps://github.com/MITLibraries/oastats-backend

Visualization interfacehttps://github.com/MITLibraries/oastats-ui

Email notification systemhttps://github.com/MITLibraries/poast

Page 16: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Pipeline

Page 17: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Pipeline

https://github.com/MITLibraries/oastats-backend

• Apache logs• Python• DSpace• GeoIP• SOLR

Page 18: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Pipeline

Start from Apache server logs

● Filter the qualifying downloads● Look up the downloaded paper● Augment with additional information● Store in MongoDB● Use SOLR to build summary collection

UI queries summary collection

Page 19: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Pipeline

Page 20: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Pipeline challenges

Page 21: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Pipeline challenges - authors

Author identities● Field-specific naming conventions

“Abelson, Hal”“Abelson, H”“Hal Abelson”

● Common names, similar people“J Smith”“Alex Slocum”

Page 22: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Pipeline challenges - authors

Page 23: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

[ { "mitid": “3.1415926537", "name": "Cohen-Tanugi, David" }, { "mitid": “2.7182818", "name": "Dave, Shreya H." }, { "mitid": “6.02x10^23", "name": "Grossman, Jeffrey C." }, { "mitid": “1123581322", "name": "Lienhard, John H." }, { "mitid": “1234567890", "name": "McGovern, Ronan Killian" }]

Pipeline challenges - authors

Page 24: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Pipeline challenges - departments

Department names● Inconsistent program / department affiliations

o “Media Laboratory”o “Center for Bits and Atoms” (subgroup within Media Lab)

● Spelling Variationso “MIT Department of Physics”o “Massachusetts Institute of Technology, Department of Physics”o “Dept. of Physics”o “Physics”

Page 25: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Pipeline challenges - departments

Standardized department names

Whitelist of recognized names

Separate variations for display and linking back to DSpace@MIT

Page 26: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

{

"_id": ObjectId("5449127895b0c25083f29352"),

"handle": "http://hdl.handle.net/1721.1/52491",

"title": "A basal ganglia-forebrain circuit in the songbird biases motor output to avoid vocal errors",

"country": "USA",

"authors": [

{ "mitid": "3.1415926537", "name": "Fee, Michale S.“ },

{ "mitid": "6.02x10^23", "name": "Andalman, Aaron S." }

],

"dlcs": [

{

"display": "McGovern Institute for Brain Research at MIT",

"canonical": "McGovern Institute for Brain Research at MIT"

},

{

"display": "Brain and Cognitive Sciences",

"canonical": "Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences"

}

],

"time": ISODate("2010-08-10T17:14:03Z"),

"request": "/openaccess-disseminate/1721.1/52491",

"referer": "http://www.google.com/search?q=head+mounted+microphone+zebra+finch&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a",

"user_agent": "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8",

"ip_address": "128.218.64.242",

"status": "200",

"filesize": "1661848"

}

Augmented download record

Page 27: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

{

"_id" : "Overall",

"countries" : [

{

"country" : "862",

"downloads" : 35

} …

],

"dates" : [

{

"date" : "2014-01-07",

"downloads" : 3

} …

],

"downloads" : 10000,

"size" : 101,

"type" : "overall"

}

Summary record

Page 28: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Web interface

Page 29: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Web interface

Page 30: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Web interface

https://github.com/MITLibraries/oastats-ui

● Mongo-backed● PHP● DataTables● D3.js● DataMaps

Page 31: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Web interface

Page 32: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Web interface

Page 33: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Web interface

Page 34: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Web interface

Page 35: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Web interface

Page 36: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Email to authors

Page 37: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Email to authors

Dear {name}, Thank you for sharing your scholarly articles through the open repository DSpace@MIT <https://dspace.mit.edu/handle/1721.1/49433/>, in association with the MIT Faculty Open Access Policy <https://libraries.mit.edu/oapolicy>. Our newly implemented OA Stats Service provides data about the use and reach of our open access collection. Since August 2010, 15,184 articles have been downloaded from 227 different countries. This service also provides information at the author and article level: Your {count_articles} articles have been downloaded {count_downloads} times since they were deposited, from {count_countries} different countries. You can access more detailed download information about your articles, including per-article and per-country downloads at <https://oastats.mit.edu>. Initially, we plan to provide this information to all authors via email in the Fall and Spring semesters. As we seek to improve the service, we'll consider expanding options to interact with it and the underlying data. We are anxious to hear your feedback on how this service can be most useful to you, so please send your suggestions to [email protected]. --From the MIT Libraries

Page 38: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Email to authors

Page 39: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Email to authors

Page 40: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Faculty reception

Excitement● “Thank you for the update, this is a fantastic tool!!”● “Thanks so much for doing this - it's really cool and awesome!”

Page 41: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Faculty reception

Excitement● “Thank you for the update, this is a fantastic tool!!”● “Thanks so much for doing this - it's really cool and awesome!”

Why not more?● “Hi, I like your feedback. But I am puzzled that only one of my articles is in

your database.”● Department heads using this as leverage to encourage further

contributions

Page 42: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Project goals revisited

• Make available download statistics at three levels: author, article, and aggregate

• Incentivize deposits to collection• Provide useful information for policy evaluation

• Evaluate new technologies within the Libraries (i.e. MongoDB)

Page 43: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Future work

● Automate the pipeline● Run pipeline more frequently● Ditch Mongo for something relational● Talk to faculty about making more detailed information

public● Add functionality to UI (more export formats, SPA)● Improve cataloging in DSpace@MIT with lookup

services

Page 44: June 11, 2015 Matthew Bernhardt Open Repositories 2015 Visualizing Open Access building a scalable infrastructure to showcase the reach of MIT research

June 11, 2015Matthew Bernhardt Open Repositories 2015

Thanks!

Matt [email protected]@morphosis7

https://github.com/MITLibraries/oastats-backend

https://github.com/MITLibraries/oastats-ui

http://oastats.mit.edu