building a web application to monitor pubmed retraction notices

Post on 10-May-2015

604 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Monitoring PubMed retraction notices using Ruby, MongoDB, Sinatra and Heroku. Talk given to internal CSIRO Bioinformatics User Group, December 1 2011.

TRANSCRIPT

Building a Web Application to Monitor PubMedRetraction Notices

Neil Saunders

CSIRO Mathematics, Informatics and StatisticsBuilding E6B, Macquarie University Campus

North Ryde

December 1, 2011

Retraction Watch

Project Aims

Monitor PubMed for retractions

Retrieve retraction data and store locally for analysis

Develop web application to display retraction data

PubMed - advanced search, RSS and send-to-file

Updates in Google Reader

PubMed - MeSH

PubMed - EUtils

http://www.ncbi.nlm.nih.gov/books/NBK25501/

EInfo example script

#!/usr/bin/rubyrequire ’rubygems’require ’bio’require ’hpricot’require ’open-uri’

Bio::NCBI.default_email = "me@me.com"ncbi = Bio::NCBI::REST.newurl = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db="ncbi.einfo.each do |db|

puts "Processing #{db}..."File.open("#{db}.txt", "w") do |f|

doc = Hpricot(open("#{url + db}"))(doc/’//fieldlist/field’).each do |field|

name = (field/’/name’).inner_htmlfullname = (field/’/fullname’).inner_htmldescription = (field/’description’).inner_htmlf.write("#{name},#{fullname},#{description}\n")

endend

end

EInfo script - output

ALL,All Fields,All terms from all searchable fieldsUID,UID,Unique number assigned to publicationFILT,Filter,Limits the recordsTITL,Title,Words in title of publicationWORD,Text Word,Free text associated with publicationMESH,MeSH Terms,Medical Subject Headings assigned to publicationMAJR,MeSH Major Topic,MeSH terms of major importance to publicationAUTH,Author,Author(s) of publicationJOUR,Journal,Journal abbreviation of publicationAFFL,Affiliation,Author’s institutional affiliation and address...

MongoDB Overview

MongoDB is a so-called “NoSQL” databaseKey features:

Document-oriented

Schema-free

Documents stored in collections

http://www.mongodb.org/

Saving to a database collection: ecount

#!/usr/bin/ruby

require "rubygems"require "bio"require "mongo"

db = Mongo::Connection.new.db(’pubmed’)col = db.collection(’ecount’)Bio::NCBI.default_email = "me@me.com"ncbi = Bio::NCBI::REST.new

1977.upto(Time.now.year) do |year|all = ncbi.esearch_count("#{year}[dp]", {"db" => "pubmed"})term = ncbi.esearch_count("Retraction of Publication[ptyp] #{year}[dp]",

{"db" => "pubmed"})record = {"_id" => year, "year" => year, "total" => all,

"retracted" => term, "updated_at" => Time.now}col.save(record)puts "#{year}..."

end

puts "Saved #{col.count} records."

ecount collection

> db.ecount.findOne(){

"_id" : 1977,"retracted" : 3,"updated_at" : ISODate("2011-11-15T03:58:10.729Z"),"total" : 260517,"year" : 1977

}

Saving to a database collection: entries

#!/usr/bin/ruby

require "rubygems"require "mongo"require "crack"

db = Mongo::Connection.new.db("pubmed")col = db.collection(’entries’)col.drop

xmlfile = "#{ENV[’HOME’]}/Dropbox/projects/pubmed/retractions/data/retract.xml"xml = Crack::XML.parse(File.read(xmlfile))

xml[’PubmedArticleSet’][’PubmedArticle’].each do |article|article[’_id’] = article[’MedlineCitation’][’PMID’]col.save(article)

end

puts "Saved #{col.count} articles."

entries collection

{"_id" : "22106469","PubmedData" : {

"PublicationStatus" : "ppublish","ArticleIdList" : {

"ArticleId" : "22106469"},"History" : {

"PubMedPubDate" : [{

"Minute" : "0","Month" : "11","PubStatus" : "entrez","Day" : "23","Hour" : "6","Year" : "2011"

},{

"Minute" : "0","Month" : "11","PubStatus" : "pubmed","Day" : "23","Hour" : "6","Year" : "2011"

},...

Saving to a database collection: timeline

#!/usr/bin/ruby

require "rubygems"require "mongo"require "date"

db = Mongo::Connection.new.db(’pubmed’)entries = db.collection(’entries’)timeline = db.collection(’timeline’)

dates = entries.find.map { |entry| entry[’MedlineCitation’][’DateCreated’] }dates.map! { |d| Date.parse("#{d[’Year’]}-#{d[’Month’]}-#{d[’Day’]}") }dates.sort!data = (dates.first..dates.last).inject(Hash.new(0)) { |h, date| h[date] = 0; h }dates.each { |date| data[date] += 1}data = data.sortdata.map! {|e| ["Date.UTC(#{e[0].year},#{e[0].month - 1},#{e[0].day})", e[1]] }

data.each do |date|timeline.save({"_id" => date[0].gsub(".", "_"), "date" => date[0], "count" => date[1]})

end

puts "Saved #{timeline.count} dates in timeline."

timeline collection

> db.timeline.findOne(){

"_id" : "Date_UTC(1977,7,12)","date" : "Date.UTC(1977,7,12)","count" : 1

}

Sinatra: minimal example

require "rubygems"require "sinatra"

get "/" do"Hello World"

end

# ruby myapp.rb# http://localhost:4567

Highcharts: minimal example code

var chart = new Highcharts.Chart({chart: {

renderTo: ’container’,defaultSeriesType: ’line’

},xAxis: {

categories: [’Jan’, ’Feb’, ’Mar’, ’Apr’, ’May’, ’Jun’,’Jul’, ’Aug’, ’Sep’, ’Oct’, ’Nov’, ’Dec’]

},series: [{

data: [29.9, 71.5, 106.4, 129.2, 144.0, 176.0,135.6, 148.5, 216.4, 194.1, 95.6, 54.4]

}]});

// <div id="container" style="height: 400px"></div>

Highcharts: minimal example result

Web Application Overview

|---config.ru|---Gemfile|---main.rb|---public| |---javascripts| | |---dark-blue.js| | |---dark-green.js| | |---exporting.js| | |---gray.js| | |---grid.js| | |---highcharts.js| | |---jquery-1.4.2.min.js| |---stylesheets| |---main.css|---Rakefile|---statistics.rb|---views

|---about.haml|---byyear.haml|---date.haml|---error.haml|---index.haml|---journal.haml|---journals.haml|---layout.haml|---test.haml|---total.haml

Sinatra Application Code - main.rb

# main.rbconfigure do

# a bunch of config stuff goes here# DB = connection to MongoDB database# timelinetimeline = DB.collection(’timeline’)set :data, timeline.find.to_a.map { |e| [e[’date’], e[’count’]] }

end

# viewsget "/" do

haml :indexend

Sinatra Views - index.haml

%h3 PubMed Retraction Notices - Timeline%p Last update: #{options.updated_at}

%div#container(style="margin-left: auto; margin-right: auto; width: 800px;")

:javascript$(function () {

new Highcharts.Chart({chart: {

renderTo: ’container’,defaultSeriesType: ’area’,width: 800,height: 600,zoomType: ’x’,marginTop: 80

},legend: { enabled: false },title: { text: ’Retractions by date’ },xAxis: { type: ’datetime’},yAxis: { title:

{ text: ’Retractions’ }},

series: [{data: #{options.data.inspect.gsub(/"/,"")}

}],// more stuff goes here...});

});

Deployment: Heroku + MongoHQ

Heroku.com - free application hosting (for small apps)

Almost as simple as:

$ git remote add heroku git@heroku.com:appname.git

$ git push heroku master

MongoHQ.com - free MongoDB database hosting (up to 16 MB)

“Final” product

Application - http://pmretract.heroku.com

Code - http://github.com/neilfws/PubMed

top related