bioinformatics of mammaliain gene expression (bomge) 07 june 2005 gene regulation informatics

21
Bioinformatics of mammaliain gene expression (BoMGE) 07 June 2005 Gene Regulation Informatics

Post on 19-Dec-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

Bioinformatics of mammaliain

gene expression (BoMGE)

07 June 2005Gene Regulation Informatics

Deliver what?

System...

History/timelines

Competitive position

Deliver what?

‘Comprehensive catalog’ of mammalian regulatory elements

‘Validated’, known accuracy

Clustered into similar groups - ‘TF models’

Annotated as known/novel

Modules identified, ‘specific to...’

Predictions extrapolated to remote regions

Predictive system

Mostly JavaSome Perl/bash270 CPUs/OSCAR

TRANFSAC 9.1Manual TFBS

EnsEMBL-basedGeneralize...

OPTICS

Accuracy

metrics

Coexpression resource

How best to use it? Motif discovery? Motif co-ccurrence?

Multi-source orthologue resource

Compara, HomoloGene, Inparanoid, KEGG

Compara, HomoloGene, Inparanoid,

KEGG, …

Visual comparative genomics: Assessing ortholog annotations

LAGAN alignment detects

misannotated chicken gene

Orthologues of a human gene

Assess sequence conservation for a

coding exon (MLAGAN).

Motif discovery with multiple

methods/params

Methods(W)CONSENSUSMEMEMotifSamplerGibbs SamplerBioprospector, MDmodule, …WeederCisModuleNestedMICA, Sombrero,...

‘Multiple’ means Methods Motif occurrence models Other parameters

Motif scores p-values

Target

Cumulative motif score distns

p-val = 0.02

No p-val threshold

1 Discover with target and random sequences.2 Apply method-independent score.3 Use random distribution to assign p-value to a score.

Random

1500b region

Motif clustering, co-occurrence

TRANFSAC 9.1Manual TFBS

OPTICS

Accuracy

metrics

Clustering with OPTICS

Reachability plot

JASPAR scan test: 50-PWMs, 100 target sequence sets

Labeled cluster

contents

1 Pairwise motif similarity measure. 2 Scalable hierarchical clustering method with automatic stopping. [32 CPUs, 96 GB RAM, 64-bit OS]

www.cisred.orgv1.1: human, mousehuman: 6K genes, 120K motifs

Web database design and construction

Main competitors

Zhang - Cold Spring Harbor Lab

Lander/Kellis - MIT

Bolouri - Institute for Systems Biology

Hardison/Haussler - Penn State/UCSC

...

High throughput ... low throughput

Large scale’s here. Now what?

Production / R&D

Hi/lo throughput. Collaborators

Accuracy / complexity / data integrationChIP-xxxx, expression specificity, chromatin state, 3’UTRs, LREs... ENCODE

Regulatory networks and cascades

Competitive opportunities

Monica - C. elegans, briggsae, unannotated

Erin - Drosophila, ..., unannotated

Han Hao / Jim Kronstad (UBC) - fungi

Generalize

SNPs - Stephen MontgomeryRepetitive regions - Dixie

Mager

Competitive opportunities

Many target genes, many orthologuesLow-coverage/unannotated genomes

Accuracy - resources, methods, protocols, ...

Coexpression and orthologyDiscovery input vs. co-occurrence/modules

Motif similarity, clustering - a superset?

cisRED annotations in EnsEMBL

‘Contextual’ motif/module resource...

‘Context’ in cisRED

Discovered motifsDiscovered motifs

Motif similarity measuresMotif similarity measures

Clustering methodsClustering methods

‘Known’ motif resources

‘Known’ motif resources

Annotate motifs as known/novel

Annotate motifs as known/novel

Motif groups(specific to...)Motif groups(specific to...)

Other result types

Other result types

‘Accuracy’‘Accuracy’Motif classification system

Motif classification system

Competitive opportunities

Validated predictions Myers/Stanford Collaborators

Be ‘on the short list’ Collaborators, publications

GC3 - ChIP-SAGE, networks...

Acknowledgements

Misha Bilenky, Chris Fjell, Obi Griffith, Han Hao, Ann He, Bernard Li, Keven Lin, Stephen Montgomery, Mehrdad Oveisi, Erin Pleasance, Neil Robertson, Wenjia Pan, Monica Sleumer, Kevin Teague, Richard Varhol, Maggie Zhang, Asim Siddiqui, Steven Jones

Jianjun Zhou, Jörg SanderDept. Computing Science, University of Alberta

Tamara Astakhova, Maik Hassel, James Kennedy, Eddy Tsang, Tony Fu, ...

FundingGenome Canada, BC Cancer Foundation, Michael Smith Foundation for Health Research

TF classification / known motifs