data mining - university of virginiapeople.virginia.edu/~wrp/cshl06/pdf/birney_ensmart.pdf ·...
TRANSCRIPT
EnsMart
Ewan Birney, European Bioinformatics Institute (EBI)
Data Mining...
• ...Is more than a buzz-word
• Most molecular biology ismoving away from one-gene-at-a-time approaches
• Needs to make and work withGene Lists
Complex Disease Association
GenomeScan
Linkage peaks
Animal models
Candidate Genes
SNP choosing and validation
Patient vs Control association studies
MicroArray
Interesting
Spots
From A
Experiment
Done on
Platform A Interesting
Spots from
B
Experiment
Done on
Platform B
Integrate Spots
To form Gene Set
Dump orthologs and promoters
Computational Screen
Dump Gene regions from set of protein kinase genes
Formulate Clever method in house
Display results in Genome context
On biologist-friendly web display
data mining problems...
• You need all the data in one place to
provide data
• The natural language for database
queries (SQL) is not... so natural!
• Often SQL queries are very slow on
normalised databases
• Often there is additional analysis
which needs to occur
EnsMart
• SQL queries are slow :-
– transform the data into queryoptimised read-only database
• Additional analysis is needed
– Precompute additional analysis forall items (disk space is cheap!)
• You need all the data in oneplace
– Federate databases (BioMart)
Normalised databases
Gene
Transcripts
Exons
Sequence
External
Reference
>1
>1
>1
Five (six) table join for “genes with this set of affymetrix
Ids on this chromosome
Mart Transformation
Normalised
Query optimised
(reverse star schema)
Web User interface
• Web based
• Wizard like
• “dataset”
(focus)
• “filter” -
restrictions
• output
– columns to
show
– sequence
Set based work
• EnsMart can export sets of Ids
(Ensembl, Affymetrix, Uniprot)...
• EnsMart can also filter on a
given set of Ids
– (give me all the chromosome
locations of genes defined by my
Affymetrix information)
BioMart
Ensembl specific
Only runs from
www.ensembl.org
Made Generic
Multiple
installations
Query federation
(Arek Kaspryck)
BioMart
Any Schema data-mart schema user interface
Mart Builder Mart config
(XML specification)
BioMart
• www.ebi.ac.uk/b
iomart
• Google for
BioMart
• Ensembl
• Uniprot
• MSD structures
BioMart future...
• ArrayExpress (gene expression
dataset)
• WormBase
• ...others
Cross-Internet Mart…
WWW
Mutant Stock
Sample Mart Ensembl
Genome
Mart
Firewall
Array Express
Expression Atlas
Mart
Mart Query Building
Software
Give me all the genes mapped within phenotype
X in my samples that are also at least 4 fold upregulated
In kidney. Give me all the ht SNPs in all the genes…