tripal within the arabidopsis information portal - pag xxiii

araport.org@araport

Tripal within the Arabidopsis Information Portal

Vivek Krishnakumar J. Craig Venter Institute

12/11/2015

Tripal Database Network and Initiatives PAG XXIII, San Diego, CA

araport.org@araport

Overview

•  About Araport •  Current architecture •  Planned implementation – Leverage Chado schema – Accommodate inherited data – Serve as point of integration – Facilitate data sharing via web services

araport.org@araport

About Araport

•  Objectives – Develop community web interface

•  sustainable, fundable and community-extensible •  hosts analysis modules, visualization tools, user data

spaces –  Practice data federation

•  integrate diverse data sets from distributed sources •  consume and expose data via RESTful web services

– Maintain “gold standard” Col-0 annotation •  assemble tissue-specific transcripts from publicly available

RNA-seq datasets •  incorporate novel coding and non-coding genes

araport.org@araport

Araport https://www.araport.org •  Explore data

•  ThaleMine •  JBrowse •  Science Apps

•  Search data •  Quick Search •  BLAST •  Raw data downloads

•  Community •  News & Events •  Ask a question •  Job Postings •  Useful Links

araport.org@araport

Araport Architecture External programs Portal (www.araport.org)

API (api.araport.org)

Agave Core meta data

user profile ADAMA

service manage

service enroll

a b c d e f

Computing

Storage

Databases

ThaleMine JBrowse

Authentication, metering, logging, versioning, HTTPS, CORS

a b c d e f

Systems

InterMine

Others

Tripal

Science Apps

araport.org@araport

Current implementation

Araport data mart

Combination of flat-files and databases •  TAIR datasets •  Ontologies (GO, PSI) •  Interactions (BAR) •  Orthologs (Panther)

Data Mart •  InterMine schema, PostgreSQL DB •  Indexed and flattened for speed •  Rebuilt periodically

Outputs •  ThaleMine WebApp •  ThaleMine web services

publish

Araport warehouse

Web services

InterMine loader live calls to… •  UniProt web services •  PubMed web services

publish

araport.org@araport

Planned implementation

Araport warehouse Araport data mart

Warehouse •  Chado schema, PostgreSQL DB •  General purpose but slow •  Permanent host for core genomic

datasets (assembly, annotation, metadata, etc.)

Inputs •  Genome annotation pipeline •  Community curation data

Outputs •  ThaleMine WebApp •  ThaleMine web services

publish

Data Mart •  InterMine schema, PostgreSQL DB •  Indexed and flattened for speed •  Rebuilt periodically

araport.org@araport

•  Functions as our low-level (core) Araport data warehouse –  Preserve legacy datasets with appropriate attributions –  Track any new datasets generated (annotation updates,

community contributions) –  Serve as point of integration and de-duplication of

certain data types –  Integrate with planned community curation interface

•  Supports our pursuit of being open-source (and future-proof)

http://gmod.org/wiki/Chado

araport.org@araport

•  Drupal CMS based modularized framework, exposing a user-friendly interface to Chado – provides standardized loaders for genomic

datasets (FASTA, GFF3, GenBank, BLAST, GO, InterProScan, KEGG) – supports building custom templates and

materialized views – exposes well documented API

http://tripal.info

araport.org@araport

Integrate data inherited from TAIR

•  Currently a combination of flat-files and TAIR’s Oracle database –  Genome Assembly (TAIR9) –  Genome Annotation (TAIR10): genes, pseudogenes, transposons,

ncRNAs –  Annotation properties: gene symbols, confidence ranking, functional

descriptions, curator summary –  GO Annotations (TAIR curated data at geneontology.org) –  Publications (curated gene à publication relationships) –  Variation data: Genetic markers, Polymorphisms (SNPs, TILLing) and T-

DNA Insertions –  Stock data (lines, clones, germplasm)

•  Chado backed Tripal will serve as the core repository for this data

araport.org@araport

Integrate with planned Community Curation Interface

araport.org@araport

Integrate publication data

•  Existing sources for publication data –  TAIR locus to PubMed ID mapping – NCBI gene2pubmed mapping – UniProt curated Protein to PubMed ID mapping –  Publications missing PMIDs and/or DOIs

•  Chado will act as point of integration – Combine and de-duplicate publication data from 3

sources (more in the future) – Collect and store metadata for publications with and

without PMID and/or DOIs

araport.org@araport

Integrate Stock data

•  TAIR stock related tables mapped to corresponding Chado counterpart

•  Custom loaders developed to perform bulk update of Stock information, Phenotypes, Polymorphism data and mappings to AGI locus

araport.org@araport

Role of Tripal within Araport

•  Tripal is under active development, with plans in place to begin developing rational web services (WS) as well as support interoperability

•  Araport plans to be involved in this working group to satisfy the following needs of our project: –  Expose live data from future annotation update

pipelines to the community directly via WS –  Expose stock data via WS in a standardized manner

to Arabidopsis stock centers (both ABRC and NASC) to aid data synchronization

–  Embrace and support other open-source initiatives

araport.org@araport

Araport on GitHub

•  GitHub organization: https://www.github.com/Arabidopsis-Information-Portal

•  Relevant repositories: –  tair-chado-batchflow –  chado_pub_loader –  pasa-chado-hook –  GMOD/Apollo (fork)

araport.org@araport

Acknowledgements

•  JCVI Developers – Maria Kim –  Irina Belyaeva – Svetlana Karamycheva

•  Tripal co-PI Stephen Ficklin and development community

•  TAIR/Phoenix Bio: assistance with data migration

•  Funding Agencies

araport.org@araport

Chris Town, PI

Lisa McDonald Education and Outreach Coordinator

Chris Nelson Project Manager

Jason Miller, Co-PI JCVI Technical Lead

Erik Ferlanti Software Engineer

Vivek Krishnakumar Bioinf. Engineer

Svetlana Karamycheva Bioinf Engineer

Eva Huala Project lead, TAIR

Bob Muller Technical lead, TAIR Gos Micklem,

Sergio Contrino Software Engineer

Matt Vaughn co-PI Steve Mock

Advanced Computing Interfaces

Rion Dooley, Web and Cloud Services

Matt Hanlon, Web and Mobile Applications

Maria Kim Bioinf Engineer

Ben Rosen Bioinf Analyst

Joe Stubbs, API Developer Platform

Walter Moreira API Developer Federation

Chris Jordan Database Manager

Eleanor Pence Intern

Chia-Yi Cheng Bioinf Analyst

Seth Schobel Bioinf. Engineer

Araport Team

Irina Belyaeva Software Engineer

araport.org@araport

THANK YOU!

araport.org@araport

Araport @ PAG XXIII Session Details Topic(s) Presenter(s)

Tripal Database Network and Initiatives

Sunday, January 11, 2015 5:30 PM-5:45 PM

California

W876: Tripal within the Arabidopsis Information Portal Vivek Krishnakumar

Arabidopsis Information Portal & IAIC Workshop Monday, January 12, 2015

12:50 PM-3:00 PM Pacific Salon 6-7 (2nd Floor)

W059: Walkthrough the Araport Web Site W061: Exposing Web Services for Araport W062: Developing applications for Araport

Chia-Yi Cheng Jason Miller Matt Vaughn

Computer Demo 2 Tuesday, January 13, 2015

12:30 PM California

C23: Using the Arabidopsis Information Portal Jason Miller

GMOD Wednesday, January 14, 2015

11:30 AM Golden West

W410: JBrowse within the Arabidopsis Information Portal Vivek Krishnakumar

Poster Session – Even Monday, January 12, 2015

10:00 AM-11:30 AM Grand Exhibit Hall

P0790: Data Integration for the Plant Research Community: Araport P0792: Developing Content for the Arabidopsis Information Portal

Chia-Yi Cheng Matt Vaughn

tripal within the arabidopsis information portal - pag xxiii

inherited data

data sharing

araportintegrate data

core araport data warehouse

diverse data sets

restful web services

araport functions

araport current architecture

Science

kajala arabidopsis

xxiii puentes atirantados

xxiii olympic winter games -...

java.arh.pub.rojava.arh.pub.ro/en/database/oracle/oracle-database/12.2/... ·...

fantastic tripal

arabidopsis experiments

chapter xxiii - everglades.fiu.edu

jbrowse within the arabidopsis information portal - pag...

adulescens xxiii

xxiii annual report

arabidopsis 06

€¦ · campus decoration props with thermocol and...

technical reference for oracle analytics cloud - essbase ·...

xxiii atlantis cup

iwaszczuk, ettrav xxiii

arabidopsis experiments:

convocatoria xxiii

xxiii semana monografica

exercitia cap xxiii

ipr xxiii vorträge bartelheim, martin and féaux de la...