mongodb and the connectivity map: making connections between genetics and disease

44
making connections between genetics and disease MongoDB and the Connectivity Map

Upload: mongodb

Post on 06-Dec-2014

384 views

Category:

Technology


2 download

DESCRIPTION

The Broad Institute has developed a novel high-throughput gene-expression profiling technology and has used it to build an open-source catalog of over a million profiles that captures the functional states of cells when treated with drugs and other types of perturbations. Referred to as the Connectivity Map (or CMap), these data when paired with pattern matching algorithms, facilitate the discovery of connections between drugs, genes and diseases. We wished to expose this resource to scientists around the world via an API that is easily accessible to programmers and biologists alike. We required a database solution that could handle a variety of data types and handle frequent changes to the schema. We realized that a relational database did not fit our needs, and gravitated towards MongoDB for its ease of use, support for dynamic schema, complex data structures and expressive query syntax. In this talk, we’ll walk through how we built the CMap library. We’ll discuss why we chose MongoDB, the various schema design iterations and tradeoffs we’ve made, how people are using the API, and what we’re planning for the next generation of biomedical data.

TRANSCRIPT

  • 1. MongoDB and the Connectivity Mapmaking connections between genetics and disease

2. . 3. . 4. . 5. . 6. [email protected]@CoreyJFlynn 7. Gene Expressiona common language for biology 8. . 9. . 10. . 11. . 12. . 13. .132006~7,000 experimentsOver 19,000 registered usersCited by over 1,200 scientific reports 14. .2006 15. .2014 16. .16 17. Connectivity Map Dataset1.4 million gene expression profiles12,488 Compounds FDA approved drugs Bioactive tool compounds Screening hits3,800 Genes (shRNA & cDNA) Targets/pathways of approved drugs Candidate disease genes Community nominations15 Cell types Banked primary cell types Cancer cell lines Primary hTERT-immortalized Patient-derived iPS cells Community nominated 18. Connectivity Map DataEasy to describe, tough to Model Diverse users and use-cases Annotations are complex and oftenincomplete Frequent updates 19. Data ModelAn agile philosophy keeps the model tractableStore just whats neededTest and use dailyRefactor frequently 20. Data ModelAn agile philosophy keeps the model tractableStore just whats neededTest and use dailyRefactor frequently 21. Data ModelAn inventory of signaturessignature_info 22. Data ModelShared fields as separate collectionssignature_infocell_info 23. Data ModelShared fields as separate collectionssignature_info treatment_info 24. Data ModelAdd computed fields and external meta-datasignature_info cell_info 25. Data ModelDenormalize to optimize lookupssignature_info treatment_info 26. APIsAre awesome, life science needs more of them/siginfo/cell/A 27. APIsAre awesome, life science needs more of them/siginfo?q={cell:A} 28. APIMongoDB inspired a rich query syntaxFunction ExampleQuery /siginfo?q={cell:A,name:B}Field selection /siginfo?q={}&f={name:1}Document count /siginfo?q={}&c=trueDocument limit /siginfo?q={}&l=10Skip documents /siginfo?q={}&l=10&sk=10Sort order /siginfo?q={}&s={name:-1,cell:1}Distinct values /siginfo?q={}&d=nameAggregation /siginfo?q={}&g=name 29. APINode and Mongoose enable easy API creation 30. Language BindingsJSON as a universal formatJavascriptPythonR 31. Analytic ToolsA compute API liberates command line scripts 32. Compute APIMessage queuing via a capped collection 33. A research platform for functionalgenomics 34. Predicting Drug FunctionDiverse structures, common activities 35. Predicting Drug FunctionDiverse structures, common activitiesVEGFR inhibitorPPARG agonistPI3K/MTOR inhibitorROCK inhibitorEstrogen agonist 36. Finding Novel Drug TargetsRepurposing failed drugsOriginal target 37. Finding Novel Drug TargetsRepurposing failed drugsOriginal targetFailed in Phase 2 clinical trial due to lack of efficacy 38. Finding Novel Drug TargetsRepurposing failed drugsOriginal targetNovel Target ANovel Target BNovel Target CNovel Target D