pathway interaction database (pid) market research bioportals tiger team meeting mervi heiskanen...

11
Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

Upload: cori-hicks

Post on 16-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

Pathway Interaction Database (PID)Market Research

BioPortals Tiger Team MeetingMervi Heiskanen

January 31, 2013

Page 2: Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

2

Background

• Highly structured, curated collection of information about known biomolecular interactions and key cellular processes assembled into signaling pathways.

• Collaboration with Nature Publishing Group who supported the PID

by providing manual curation of pathways. Curation contract ended in 2012, engineering contact ended in 2011.

• PID Site Stats Averages for 2012:– Average users per month: 7,000– Average number of visits per month: 10,000– Average visit duration: 8 minutes

• Next step: Market research to determine community needs, and to guide future direction of PID.

Page 3: Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

3

• “PID retirement revisited” e-mail sent to PID power users (35) to schedule phone interviews. PID questionnaire attached for those preferring to respond by e-mail.

• Some asked to keep responses confidential.• Received 11 responses, all support continuing

development of PID. • 31% response rate!

4 Phone Interviews

7 E-mail Responses

Questionnaire

Page 4: Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

4

Pathway Interaction Database (PID) questionnaire

• Please describe:– How are you accessing PID data?

• Data Portal. • Download: xml, BioPAX, SVG, JPG, API (via caBio): Is programmatic

access to data useful /critical?

– How critical are data updates? Frequency of updates?

– Data curation: What are your thoughts on the method of PID data curation? Would you propose to retain curation by a dedicated PID analyst and review by 1-2 domain experts (as has been done in the past) or move to a community based curation model (e.g., similar to Wiki pathways)?

– Would it be useful to integrate PID with other pathway resources, e.g. Reactome, Wiki pathways?  How important is it to keep PID as a separate entity to retain cancer focus?

– Value of the PID data portal: what are the features you find useful?

– Any other comments, ideas?

Page 5: Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

5

How are you accessing PID Data?10 responses

• PID portal: 9 users• BioPAX: 9 users• XML: 3 users• API: 3 responses (might be useful in the future)• SVG: 1 user

• Several data portal users do not use it very often, they are mostly downloading BioPAX.

• “I mainly access PID by downloading the UniprotKB mapping file and the pathway ontology annotations” .

Page 6: Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

6

How critical are data updates? Frequency of updates?10 responses: maintenance and updates are important

• Very important to at least maintain PID: Uniprot and other accession numbers need to be kept up to date. It would be nice to have pathway updates also, e.g. every 2-6 months. Currently about 1/3 of genes in the genome represented in PID pathways, would be great to increase coverage.

• Given my previous experience I consider that two or three updates per year are sufficient to provide a high- quality service to the community. Technical error fixes, e.g BioPAX exporter are critical and should not be too difficult to release often. Resource updates can be once in several months.

Page 7: Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

7

What are your thoughts on the method of PID data curation? 11 responses

• You need someone to coordinate curation. One option is to fund existing pathway resources to curate data.

• Annotation by in-house analysts helped by outside domain experts is essential. Community annotation isn’t reliable to keep existing annotations complete and current, and fails as a general model for generating new content.

• Wiki Pathways effort is great but the quality and consistency of curation cannot be compared to PID. Hesitant to recommend crowd sourcing for PID, editorial review preferred.

• I suggest you adopt a hybrid approach. Crowd-sourcing of these pathways is useful once a pretty good model is already in place to seed the community-based editing. So the challenge for you should be to get initial pathways, reviewed by say 1 expert up on the web. Then turn it over for community editing in a WikiPathways framework

Page 8: Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

8

Importance of integration with other pathway resources and cancer focus? 10 responses.

• Keeping the cancer focus is very useful, but this can be accomplished in an integrated PID. If an independent PID curation effort continues focused on cancer pathways, other pathway resources like Wiki Pathways and Reactome could take over the task of managing the database.

• PID strongest point is its focus on cancer and I’d keep it separated. However, coordinating curation efforts with other pathway databases such as Reactome would be beneficial for both resources, since it will enable higher curation coverage.

• Unless you're really planning to quickly merge into Reactome or another pathway DB, you must have at least 1-2 curators and technical support person.

Page 9: Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

9

Value of the PID data portal, 10 responses

• Portal is nice but if cuts needed focus on data quality and BioPAX.

• Pathway diagrams, view distance between molecules, pathway enrichment.

• I find the “research highlights” and the “bioinformatics primers” interesting and useful, especially for training purposes.

• Being able to easily upload/paste a file of identified proteins/genes and finding the pathways they belong to immediately is what really sets PID apart.

Page 10: Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

10

Other comments, ideas?

• We have used PID data for about six publications to date.

• I recommend you work with WikiPathways. Crowd-sourcing would perhaps cut down the number of domain experts you need from 2 to 1 (or a “half” of a person).

• We use PID heavily as part of our GDAC efforts for the TCGA project. We've used the pathways in many major publications including our work on the ovarian, colorectal, breast, kidney, and most recently endometrial datasets all published in Nature.

Page 11: Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013

11

Conclusions

• All responders agreed PID is a valuable and unique resource and should be continued.

• They would prefer to keep PID the way it is: high quality data curation and review process by trusted neutral party.

• Cancer focus is valuable.• Data portal is useful and user friendly, BioPAX download important

for other pathway data resources using PID data.

• Restart PID data curation process.• Continue to support PID data portal at CBIIT.• Explore crowdsourcing to increase visibility and community

involvement.

Recommendation