biomart and chado arek kasprzyk gmod meeting 16 may 2005
TRANSCRIPT
![Page 1: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/1.jpg)
BioMart and CHADO
Arek KasprzykGMOD meeting16 May 2005
![Page 2: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/2.jpg)
BioMart
• User interfaces ‘advanced search’– Web wizard– GUI– Text
• Query optimization• Federation• Structured database views (dataset)
![Page 3: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/3.jpg)
BioMart schema
datasetsdatabases
![Page 4: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/4.jpg)
Dataset
• Organised into 1 - n tables with 0,1 level referencing (database view)
• Filters, Attributes• Exportables, Importables, Links• Properties captured by dataset configuration
file• Can be derived from source schema by fixed
schema transformation
![Page 5: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/5.jpg)
Datasets and schema
• Relational DB analogies– Each dataset -> table
• Relational attributes translated to unique filters and attributes
– exportable/importable ->PK/FK– A collection of datasets with unique names
create a virtual schema
![Page 6: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/6.jpg)
Structured and ‘ad hoc’ database views
![Page 7: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/7.jpg)
FK
FK
FK
FK
PK
PK
Dataset
![Page 8: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/8.jpg)
FK
FK
FK
FK
PK
FK FK FKFK
PK PK
PK PK
Dataset
![Page 9: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/9.jpg)
FK
FK
FK
FK
PK
PK
FK FK
FK FK
Dataset
![Page 10: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/10.jpg)
main1
PK1
2
PK2PK1
FK2
dm
FK2
dm
FK1 FK2
dm
FK1 FK2
PK1FK1 FK1
FK2 FK2PK2 FK1
Dataset - ‘reversed star’
![Page 11: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/11.jpg)
DatasetFixed schema transformation
A
B
TA
TB
C
![Page 12: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/12.jpg)
Transformation principles
• Main– 1:1, n:1
• Dimension– 1:n– 1:1,n:1
![Page 13: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/13.jpg)
Application
• Read database meta data• User input:
– main, dms, cardinalities• Write a configuration file• Translate configuration into DDLs• MartBuilder
![Page 14: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/14.jpg)
Transformation configuration file
• Focus tables– Main,dm
• Central, reference tables• Type: exported, imported• Keys• Optional
– Columns subset,– User table names,– Projections,– Central filters
![Page 15: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/15.jpg)
Datasets, Attributes and Filters
GENE
gene_id(PK)gene_stable_id gene_startgene_chrom_endchromosomegene_display_iddescription
Mart
Dataset
Attribute
Filter
![Page 16: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/16.jpg)
Exportables, Importables and Links
Dataset 1
Dataset 2
Links
![Page 17: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/17.jpg)
Exportables, Importables and Links
UniProt Human Ensembl Genes
Exportable Importable
name = uniprot_id
attributes = uniprot_ac
name = uniprot_id
filters = uniprot_ac_list
Links
SELECT uniprot_ac FROM ...
SELECT … FROM … WHERE uniprot_ac IN (….)
![Page 18: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/18.jpg)
Exportables, Importables and Links
Encode Human Ensembl Genes
Exportable Importable
name=genomic_region
attributes=chr_name, chr_start, chr_end
name=genomic_region
filters=chr_name (=), chr_start (>=), chr_end (<=)
Links
SELECT chr_name, chr_start, chr_end FROM ...
SELECT … FROM … WHERE (chr_name = 1 AND chr_start >= 100 AND chr_end < = 10000) OR (chr_name = 2 AND chr_start >= 50 AND chr_end < = 56780) ...
![Page 19: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/19.jpg)
Dataset configuration
• Hierachical representation of fliters and attributes– Trees– Groups– Collections
• Exportables and Importables• Basic relational mapping• Meta data - defines user interface
![Page 20: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/20.jpg)
Dataset Configuration
XML
XML
XML
![Page 21: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/21.jpg)
MartEditor
![Page 22: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/22.jpg)
Table naming conventionNaïve configuration
• Tables– Meta tables meta_content– Data tables dataset__content__type
• Data tables– Main __main – Dimension __dm
• Columns– Key _key
![Page 23: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/23.jpg)
Retrieval
myDatabase
SNPVega
EnsemblUniProt
myMart
MSD
BioMart API
JAVA Perl
MartExplorer MartShell MartView
Schema transformation
MartBuilder
XML
MartEditor
Configuration
Databases
Public data (local or remote)
BioMart architecture
![Page 24: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/24.jpg)
BioMart Registry
R
WWW GUI
RR
![Page 25: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/25.jpg)
Class diagram - configuration
![Page 26: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/26.jpg)
Class diagram - querying
![Page 27: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/27.jpg)
MartView
![Page 28: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/28.jpg)
MartShell
![Page 29: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/29.jpg)
MartExplorer
![Page 30: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/30.jpg)
Third party software
• Bioconductor (biomaRt) – BioMart schema
• Taverna – BioMart java library
• DAS ProServer – BioMart perl library
![Page 31: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/31.jpg)
biomaRt
![Page 32: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/32.jpg)
Taverna
![Page 33: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/33.jpg)
ProServer
• No programming• DAS request and responses defined by
Exportables and Importables and configured by MartEditor
• DAS1
![Page 34: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/34.jpg)
Where are we?
• 0.2 released in february• 0.3 to be released in june
– Platforms• Mysql• Oracle• Postgres
– Robust error handling
![Page 35: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/35.jpg)
Where are we?
• BioMart v 0.2– Large scale data federation (Hinxton)
• Uniprot Proteomes,MSD,Ensembl,Vega
– Optimizing access to a large database• Ensembl, WormBase, ArrayExpress
– Federating small datasets with public data • Pasteur, INRA, Bayer, Unilever, Serono, Sanofi-
Aventis, DevGen, etc …
![Page 36: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/36.jpg)
Immediate Future
• MartBuilder– GUI– XML configuration
• MartView– Scalable– Configurable
![Page 37: BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649ec55503460f94bcf552/html5/thumbnails/37.jpg)
Acknowledgments
• BioMart– Damian Smedley (EBI)– Darin London (EBI)– Will Spooner (CSHL)
• Contributors– Arne Stabenau (Ensembl)– Andreas Kahari (Ensembl)– Craig Melsopp (Ensembl)– Katerina Tzouvara (Uniprot)– Paul Donlon (Unilever)