One Scientist’s Wish List for STM Publishers
Philip E. Bourne University of California San Diego
[email protected] http://www.sdsc.edu/pb
(see presentations and publications)
My Perspective … • Background in both IT and science (chemistry, computational biology)
• My lab. distributes for free data equivalent to ¼ the Library of Congress every month
• I am a supporter of open access (provided there is a business model) and editor in chief of PLoS Computational Biology
• I am Co‐founder of SciVee Inc. • I am becoming increasingly interested in scholarly communication I Readily Acknowledge Each Discipline is Different
Your Reaction to My Viewpoint
Motivators: What is Wrong with Me?
Well My Lab Anyway
We Cannot Possibly Read a Fraction of the Papers We Should
Drivers of Change Renear & Palmer 2009 Science 325:828‐832
Hence We Are Scanning More Reading Less
Renear & Palmer 2009 Science 325:828‐832 Drivers of Change
The Truth About the Scientific eLaboratory
• I have ?? mail folders!
• The intellectual memory of my laboratory is in those folders
• This is an unhealthy hub and spoke mentality
Drivers of Change
The Truth About the Scientific eLaboratory
• I generate way more negative that positive data, but where is it?
• Content management is a mess – Slides, posters….. – Data, lab notebooks …. – Collaborations, Journal clubs …
• Software is open but where is it? • Farewell is for the data too
Drivers of Change
Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 4(7): e1000136
The Not so Hidden Truth About Science
• Scientists place more emphasis on writing and less on reading
• We are H factor obsessed, but interested in other metrics
• We are driven by (in order): – Grants – Papers – Teaching – Community service
Drivers of Change
Enough About Me – What About You?
Drivers of Change
Data and the Publication Are Disjoint
• PubMed contains 18,792,257 entries
• ~100,000 papers indexed per month
• In Feb 2009: – 67,406,898 interactive
searches were done – 92,216,786 entries were
viewed
• 1078 databases reported in NAR 2008
• MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times
Biosciences Data as of April 14, 2009 Drivers of Change
Publishing Limitations
• A paper is an artifact of a previous era • It is not the logical end product of eScience, hence: – Work is omitted – Article vs supplement is a mess – Visualization may be limited – Interaction and enquiry are non‐existent – Rich media can help, but are rarely used Drivers of Change
The Traditional PDF is an Inferior Way to Convey the
Science
The Traditional PDF is not the Natural End Product of the Research Enterprise
A paper when complete is thrown over a high wall to a publisher and essentially forgotten – Perhaps it is time to climb the wall?
uzar.wordpress.com Drivers of Change
The Game is Afoot
It is being driven from the top down and the bottom up
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM Database & Literature Integration
Context
Drivers of Change BMC Bioinformatics Accepted
Semantic Tagging of Database Content
http://www.pdb.org PLoS Comp. Biol. 6(2) e1000673 Drivers of Change
Interactive PDFs etc..
Article of the Future
Drivers of Change
Post‐publication of Video and Paper www.scivee.tv
Drivers of Change
Pubcast – Video Integrated with the Full Text of the Paper
Pubcasts - A Unique Technology
Don’t understand what you are reading? Click and have the author pop-up and explain it!
See the scientists and the
experiments behind the research papers and textbooks
Pubcasts - A Blend of Video, text, tables, figures, PowerPoints, comments, ratings…
ALL SYNCHRONIZED FOR RAPID LEARNING
Mashups – www.scivee.tv
Postercasts
Drivers of Change
More on Semantic Tagging
Semantic Tagging http://biolit.ucsd.edu
ICTP Trieste, December 10, 2007 26
Drivers of Change http://biolit.ucsd.edu
This is Literature Post‐processing Better to Get the Authors Involved
• Authors are the absolute experts on the content
• More effective distribution of labor
• Add metadata before the article enters the publishing process
Drivers of Change BMC Bioinformatics 2010 11:103
Word 2007 Add‐in for Authors
• Allows authors to add metadata as they write, before they submit the manuscript
• Authors are assisted by automated term recognition – OBO ontologies – Database IDs
• Metadata are embedded directly into the manuscript document via XML tags, OOXML format – Open – Machine‐readable
• Open source, Microsoft Public License
http://www.codeplex.com/ucsdbiolit Drivers of Change
Automatic Knowledge Discovery for Those with No Time to Read
Immunology Literature
Cardiac Disease Literature
Shared Function Drivers of Change
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrieves data from the PDB which is
analyzed
3. A composite view of journal and database
content results
Here is What I Want
1. User clicks on thumbnail 2. Metadata and a
webservices call provide a renderable image that can be annotated
3. Selecting a features provides a database/literature mashup
4. That leads to new papers
4. The composite view has links to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
PLoS Comp. Biol. 1(3) e34
Let Us Summarize Where We Are
• Scientists (aka authors, consumers) have problems at home (aka lab.)
• Publishers have problems at home (changing business models, demands etc.)
• Change is afoot, both top down and bottom up
Lets Catch Our Breath
So What Do I Think We Should Do To Solve My Problems and Your
Problems?
What Should We Do?
Consider Today’s Academic Workflow
Research [Grants]
Journal Article
Conference Paper
Poster Session
Feds
Societies
Publishers
Reviews
Blogs Community Service/Data
Curation
What Should We Do?
Consider Tomorrow’s Academic Workflow
Research [Grants]
Journal Article
Conference Paper
Poster Session
Feds
Societies
Publishers
Reviews
Blogs Community Service/Data
Curation Ideas, Data, Hypotheses
What Should We Do?
What Should We Do?
“We have an interaction with the publisher that does not begin when the scientific process ends,
but begins at the beginning of the scientific process itself. “
PLoS Comp Biol End of May
Maybe The Line is Somewhere Else?
Scientist
Idea
Experiment
Data
Conclusions
Publish What Should We Do?
Laboratory
Publisher
Maybe The Line is Somewhere Else?
Scientist
Idea
Experiment
Data
Conclusions
Publish What Should We Do?
Laboratory
Publisher
Institution
Lab Notebook
Problems with Publishing Workflows
• Workflows are not linear • Workflow : paper is not 1:1 • Confidentiality • Peer review • Infrastructure • Community acceptance • Reward system What Should We Do?
Solutions to Publishing Workflows?
• New organizations (university as publisher?)
• Appropriate reward system
• Shared governance – author, institution, publisher
• Crowd sourcing the electronic printing press What Should We Do?
Crowd Sourcing the Electronic Printing Press (aka Workshop: Beyond the PDF)
• Proposal to the US National Science Foundation:
• Aims: – Define user requirements – Establish a specification document – Open source the development effort – Have a commitment from a publisher to publish a research object using the system
– Act as an exemplar for what can be done
Logistics
• UC San Diego • Sometime in the Fall/winter 2010
• Under the auspices of W3C
• FoRC will have a follow on meeting
Those Interested Thus Far Virginia Barbour Paul Ginsparg Colin Batchelor Carole Goble richard k belew Alexander Griekspoor Tanya Beradini Timo Hannay Geoffrey Bilder Eduard Hovy Peter Binfield Peter Jerram Theodora Bloom Heather Joseph Katy Borner Julia Lane Philip Bourne Barend Mons Jean-Claude Bradley peter murray-rust Patrick Brown Catherine Nancarrow Todd Carpenter Cameron Neylon Richard Cave David Patterson Tim Clark Mark Patterson Matthew Cockerill Tracy Pelon Matt Day Dan Pollock Lee Dirks [email protected] Jonathan Eisen Brian Schottlaender Michael Eisen Borya Shakhnovich Lynn Fink David Shotton Marc Friedman Elliot Siegel Pascale Gaudet [email protected]
[email protected] Herbert Van de Sompel
Question: What if Everyone Had An Electronic Printing Press?
• Peer review might change? • Bibliometrics might change? • Business models will likely change? • What happens to the database/literature divide? • Societies might do more self publishing? • We might have improved the dissemination of science, but will we have improved the comprehension?
General References
• What Do I Want from the Publisher of the Future PLoS Comp Biol (in Press) http://www.sdsc.edu/pb
• Fourth Paradigm: Data Intensive Scientific Discovery http://research.microsoft.com/enus/collaboration/fourthparadigm/
References to Exemplars
• Semantic Biochemical Journal ‐ 2010: Using Utopia
• Article of the Future, Cell, 2009: • Prospect, Royal Society of Chemistry, 2009: • Adventures in Semantic Publishing, Oxford U, 2009:
• The Structured Digital Abstract, Seringhaus/Gerstein, 2008 • CWA Nanopublications – 2010
Acknowledgements • BioLit Team
– Lynn Fink – Parker Williams – Marco Martinez – Rahul Chandran – Greg Quinn
• Microsoft Scholarly Communications – Pablo Fernicola – Lee Dirks – Savas Parastitidas – Alex Wade – Tony Hey
• wwPDB team
• SciVee Team – Apryl Bailey – Tim Beck
– Leo Chalupa – Lynn Fink – Marc Friedman (CEO) – Ken Liu – Alex Ramos – Willy Suwanto
http://www.scivee.tv
http://biolit.ucsd.edu http//www.pdb.org http://www.codeplex.com/ucsdbiolit