motif enrichment analysis in co-expressed gene sets and high-throughput sequence sets
DESCRIPTION
Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets. Wyeth Wasserman Jan. 18, 2012. opossum.cisreg.ca/oPOSSUM3. Welcome. If you encounter any technical difficulties during the webinar Type a report using the chat option Slide presentation ~20 min - PowerPoint PPT PresentationTRANSCRIPT
www.cmmt.ubc.ca
MOTIF ENRICHMENT ANALYSIS IN CO-EXPRESSED GENE SETS AND HIGH-
THROUGHPUT SEQUENCE SETS
Wyeth WassermanJan. 18, 2012
opossum.cisreg.ca/oPOSSUM3
Welcome
• If you encounter any technical difficulties during the webinar– Type a report using the chat option
• Slide presentation ~20 min• Compile Questions as they are submitted
and answer them during the final Q&A/discussion period
• During the discussion session, we’ll allow audience speaking
2
Webinar Format
• Introduction• Walk-Through• Summary• Q&A
3
INTRODUCTION
4
Overview
• Given co-expressed gene sets, what are the key mediators of co-expression?– Focus on TFs
• Web-based software system for motif enrichment analysis– Co-expressed genes or sequences– Multiple sets of analysis methods– Available for human, mouse, fly, worm, yeast
5
Motif Enrichment Analysis
6
Background Target
0
0.2
0.4
0.6
0.8
1
TFBS1 TFBS2 TFBS3
Prop
ortio
n of
gen
es c
onta
inin
g TF
BS
BackgroundTarget
p=0.04 p=0.55 p=0.66
Finds over-represented TFBS in co-expressed gene sets
What do we need?
• Region selection– Where to look for enriched binding sites– Use conservation filter to restrict search
space• TFBS profiles to search for
– Need a pool of validated profiles• Scoring metrics for enrichment
– How to measure motif over-representation
7
GeneCR1 CR2 CR4CR3
Threshold
Genomic Position
phastConsScore
Conserved Region Selection
8
TFBS Profiles• JASPAR 2010: Portales-Casamar et al. Nucleic
Acids Research 2009.• Expanded collection of TFBS profiles
– 130 vertebrate profiles– 105 insect profiles– 5 nematode profiles– 177 yeast profiles– PBM (104), PBM_HOMEO (176), PBM_BHLH (19)
• Standardized 2-level TF classification (class, family)
9
Scoring Metrics
• Z scores– Based on the number of occurrences of the TFBS
relative to background– Normalized for sequence length– Simple binomial distribution model
• Fisher scores– Fisher exact probability test
• Fisher score = -log(Fisher p-value)– Based on the number of genes containing the TFBS
relative to background
10
Additional Metric for Seq-Based• KS scores
– Kolmogorov-Smirnoff test– Compares the empirical
distribution of the distances of the binding sites from the maximum point of confidence (MPC) to the background
– Expect real binding sites to be centered around the MPC
11
MPC
Foreground
Background
KS score = -log(KS test p-value)
Analysis Methods
12
WALK-THROUGH
13
14
http://opossum.cisreg.ca/oPOSSUM3
Human SSA - Input
15
16
17
Human SSA - Results
18
19
TF HNF1A
JASPAR ID MA0046.1
Class Helix-Turn-Helix
Family Homeo
Tax Group Vertebrates
IC 15.548
GC Content 0.259
20
Target Gene Hits 19
Target Gene Non-Hits 36
Background Gene Hits 1113
Background Gene Non-Hits 3887
Target TFBS Hits 41
Target TFBS Nucleotide Rate 0.0269
Background TFBS Hits 2127
Background TFBS Nucleotide Rate 0.009
21
Z-score 15.134
Fisher score 3.646
22
oPOSSUM methods
23
24
Human aCSA - Input
25
Human aCSA - Input
26
Human aCSA - Input
27
Human aCSA - Results
28
29
30
TFBS Cluster Analysis
31
TFBS ProfileCluster
GeneCR1 CR2 CR4CR3
TFBSs
TFBS Cluster Hits
Merge
Overrepresentation Analysisbased on merged TFBS cluster hits
TFBS Cluster Analysis (TCA)
32
Human TCA – TFBS cluster selection
33
Human TCA - Results
34
TFCluster Info Page
35
36
Seq SSA - Input
37
Seq SSA - Input
38
39
40
41
42
43
44
Seq SSA - Results
45
46
KS score
47
Seq TCA - Input
48
SUMMARY
49
oPOSSUM-3
• Web-based system for motif enrichment analysis in co-expressed gene sets and sequences from high-throughput experiments
• Important functionalities– Gene-based vs. Sequence-based– Single site vs. Anchored combination site– Individual vs. clusters of TFBS profiles– Human, mouse, fly, worm and yeast
50
Development Team
51
Version 1 CSA Version 2 Version 3• Ho Sui, SJ• Mortimer, JR• Arenillas, DJ• Brumm, J• Walsh, CJ• Kennedy, BP• Wasserman,
WW
• Huang, S• Fulton, DL• Arenillas, DJ• Perco, P• Ho Sui, SJ• Mortimer, JR• Wasserman,
WW
• Ho Sui, SJ• Fulton, DL• Arenillas, DJ• Kwon, AT• Wasserman,
WW
• Kwon, AT• Arenillas, DJ• Worsely
Hunt, R• Wasserman,
WW
QUESTIONS & ANSWERS
Please take a moment to type questions/comments into the chat box.The questions will be answered shortly.
52