tripal within the arabidopsis information portal - pag xxiii
Post on 15-Jul-2015
65 Views
Preview:
TRANSCRIPT
araport.org@araport
Tripal within the Arabidopsis Information Portal
Vivek Krishnakumar J. Craig Venter Institute
12/11/2015
Tripal Database Network and Initiatives PAG XXIII, San Diego, CA
araport.org@araport
Overview
• About Araport • Current architecture • Planned implementation – Leverage Chado schema – Accommodate inherited data – Serve as point of integration – Facilitate data sharing via web services
araport.org@araport
About Araport
• Objectives – Develop community web interface
• sustainable, fundable and community-extensible • hosts analysis modules, visualization tools, user data
spaces – Practice data federation
• integrate diverse data sets from distributed sources • consume and expose data via RESTful web services
– Maintain “gold standard” Col-0 annotation • assemble tissue-specific transcripts from publicly available
RNA-seq datasets • incorporate novel coding and non-coding genes
araport.org@araport
Araport https://www.araport.org • Explore data
• ThaleMine • JBrowse • Science Apps
• Search data • Quick Search • BLAST • Raw data downloads
• Community • News & Events • Ask a question • Job Postings • Useful Links
araport.org@araport
Araport Architecture External programs Portal (www.araport.org)
API (api.araport.org)
Agave Core meta data
user profile ADAMA
service manage
service enroll
a b c d e f
CGI
Computing
Storage
Databases
ThaleMine JBrowse
Authentication, metering, logging, versioning, HTTPS, CORS
a b c d e f
Apps
Jobs
Systems
CGI
InterMine
Others
Tripal
SOAP
CGI
REST
Science Apps
araport.org@araport
Current implementation
Araport data mart
Combination of flat-files and databases • TAIR datasets • Ontologies (GO, PSI) • Interactions (BAR) • Orthologs (Panther)
Data Mart • InterMine schema, PostgreSQL DB • Indexed and flattened for speed • Rebuilt periodically
Outputs • ThaleMine WebApp • ThaleMine web services
publish
Araport warehouse
Web services
InterMine loader live calls to… • UniProt web services • PubMed web services
publish
araport.org@araport
Planned implementation
Araport warehouse Araport data mart
Warehouse • Chado schema, PostgreSQL DB • General purpose but slow • Permanent host for core genomic
datasets (assembly, annotation, metadata, etc.)
Inputs • Genome annotation pipeline • Community curation data
Outputs • ThaleMine WebApp • ThaleMine web services
publish
Data Mart • InterMine schema, PostgreSQL DB • Indexed and flattened for speed • Rebuilt periodically
araport.org@araport
• Functions as our low-level (core) Araport data warehouse – Preserve legacy datasets with appropriate attributions – Track any new datasets generated (annotation updates,
community contributions) – Serve as point of integration and de-duplication of
certain data types – Integrate with planned community curation interface
• Supports our pursuit of being open-source (and future-proof)
http://gmod.org/wiki/Chado
araport.org@araport
• Drupal CMS based modularized framework, exposing a user-friendly interface to Chado – provides standardized loaders for genomic
datasets (FASTA, GFF3, GenBank, BLAST, GO, InterProScan, KEGG) – supports building custom templates and
materialized views – exposes well documented API
http://tripal.info
araport.org@araport
Integrate data inherited from TAIR
• Currently a combination of flat-files and TAIR’s Oracle database – Genome Assembly (TAIR9) – Genome Annotation (TAIR10): genes, pseudogenes, transposons,
ncRNAs – Annotation properties: gene symbols, confidence ranking, functional
descriptions, curator summary – GO Annotations (TAIR curated data at geneontology.org) – Publications (curated gene à publication relationships) – Variation data: Genetic markers, Polymorphisms (SNPs, TILLing) and T-
DNA Insertions – Stock data (lines, clones, germplasm)
• Chado backed Tripal will serve as the core repository for this data
araport.org@araport
Integrate with planned Community Curation Interface
araport.org@araport
Integrate publication data
• Existing sources for publication data – TAIR locus to PubMed ID mapping – NCBI gene2pubmed mapping – UniProt curated Protein to PubMed ID mapping – Publications missing PMIDs and/or DOIs
• Chado will act as point of integration – Combine and de-duplicate publication data from 3
sources (more in the future) – Collect and store metadata for publications with and
without PMID and/or DOIs
araport.org@araport
Integrate Stock data
• TAIR stock related tables mapped to corresponding Chado counterpart
• Custom loaders developed to perform bulk update of Stock information, Phenotypes, Polymorphism data and mappings to AGI locus
araport.org@araport
Role of Tripal within Araport
• Tripal is under active development, with plans in place to begin developing rational web services (WS) as well as support interoperability
• Araport plans to be involved in this working group to satisfy the following needs of our project: – Expose live data from future annotation update
pipelines to the community directly via WS – Expose stock data via WS in a standardized manner
to Arabidopsis stock centers (both ABRC and NASC) to aid data synchronization
– Embrace and support other open-source initiatives
araport.org@araport
Araport on GitHub
• GitHub organization: https://www.github.com/Arabidopsis-Information-Portal
• Relevant repositories: – tair-chado-batchflow – chado_pub_loader – pasa-chado-hook – GMOD/Apollo (fork)
araport.org@araport
Acknowledgements
• JCVI Developers – Maria Kim – Irina Belyaeva – Svetlana Karamycheva
• Tripal co-PI Stephen Ficklin and development community
• TAIR/Phoenix Bio: assistance with data migration
• Funding Agencies
araport.org@araport
Chris Town, PI
Lisa McDonald Education and Outreach Coordinator
Chris Nelson Project Manager
Jason Miller, Co-PI JCVI Technical Lead
Erik Ferlanti Software Engineer
Vivek Krishnakumar Bioinf. Engineer
Svetlana Karamycheva Bioinf Engineer
Eva Huala Project lead, TAIR
Bob Muller Technical lead, TAIR Gos Micklem,
co-PI
Sergio Contrino Software Engineer
Matt Vaughn co-PI Steve Mock
Advanced Computing Interfaces
Rion Dooley, Web and Cloud Services
Matt Hanlon, Web and Mobile Applications
Maria Kim Bioinf Engineer
Ben Rosen Bioinf Analyst
Joe Stubbs, API Developer Platform
Walter Moreira API Developer Federation
Chris Jordan Database Manager
Eleanor Pence Intern
Chia-Yi Cheng Bioinf Analyst
Seth Schobel Bioinf. Engineer
Araport Team
Irina Belyaeva Software Engineer
araport.org@araport
THANK YOU!
araport.org@araport
Araport @ PAG XXIII Session Details Topic(s) Presenter(s)
Tripal Database Network and Initiatives
Sunday, January 11, 2015 5:30 PM-5:45 PM
California
W876: Tripal within the Arabidopsis Information Portal Vivek Krishnakumar
Arabidopsis Information Portal & IAIC Workshop Monday, January 12, 2015
12:50 PM-3:00 PM Pacific Salon 6-7 (2nd Floor)
W059: Walkthrough the Araport Web Site W061: Exposing Web Services for Araport W062: Developing applications for Araport
Chia-Yi Cheng Jason Miller Matt Vaughn
Computer Demo 2 Tuesday, January 13, 2015
12:30 PM California
C23: Using the Arabidopsis Information Portal Jason Miller
GMOD Wednesday, January 14, 2015
11:30 AM Golden West
W410: JBrowse within the Arabidopsis Information Portal Vivek Krishnakumar
Poster Session – Even Monday, January 12, 2015
10:00 AM-11:30 AM Grand Exhibit Hall
P0790: Data Integration for the Plant Research Community: Araport P0792: Developing Content for the Arabidopsis Information Portal
Chia-Yi Cheng Matt Vaughn
top related