jianhongou, jun yu, haiboliu and lihuajulie · pdf fileapril 2008 jianhongou, jun yu, haiboliu...
TRANSCRIPT
April 2008
Jianhong Ou, Jun Yu, Haibo Liu and Lihua Julie Zhu
Integrated Analysis Of ChIP-seq/chip using ChIPpeakAnno, GeneNetworkBuilder and TrackViewer
Bioconductor Annual MeetingBostonJuly 28th 2017
• Introduction of ChIP-seq and ChIP-chip analysis workflow
• ChIPpeakAnno
• GeneNetworkBuilder
• TrackViewer
• Demo
Outline
HIGH-THROUGHPUT IDENTIFICATION OF DNA BINDINGSITES
• ChIP-seq– ChIP followed by high-throughput sequencing
• ChIP-chip– ChIP followed by genome tiling array analysis
Cross-link with formaldehyde
Fragment DNA
Add specific antibody to immunoprecipitate
Reverse cross-link, purify and amplify DNA
High- throughput sequencing
Hybridize to DNA microarray post DNA labeling
Fastq sequence file@HWI-ST570:42:D0PJJACXX:4:1101:1436:2323 1:N:0:CATGGATCGGAAGAGGGAANTCATCTTTGGCCCGGTGTTTCGTCCTTTCC+CCCFFFFFHHHHHHHIJGG#2AEGHIGJIJJJJJJ?FFHJGIGHIIIJIJ
Adapted from: Zhu LJ. Integrative analysis of ChIP-chip and ChIP-seq dataset. Methods Mol Biol. 2013; 1067:105-24.
ANALYSIS WORKFLOW
Adapted from: Zhu LJ. Integrative analysis of ChIP-chip and ChIP-seq dataset. Methods Mol Biol. 2013; 1067:105-24.
CHIPPEAKANNO
• Batch annotate enriched peaks– ChIP-seq– ChIP-chip– PAS-seq (Poly(A) Site Sequencing)– Cap Analysis of Gene Expression (CAGE)– Any experiments resulting in a large number of
enriched genomic regions
FUNCTIONALITY• Find the nearest genes for each set of peaks and graph the distribution around features.• Find all genes within a certain distance from the peaks • Identify enriched Gene Ontology (GO) terms and pathways associated with adjacent genes of the peaks.• Label peaks with any annotation of interest
• a dataset from the literature• CpG island• conserved element• histone modification marks
• Determine the significance of overlap and drawing Venn diagrams to visualize the extent of the overlap • binding sites among replicates• binding sites among transcription factors within a complex• binding sites among different experiments such as yours and the ones in literature
• Retrieve genomic sequences flanking putative binding sites for motif discovery, cloning or PCR amplification• Find the peaks with bi-directional promoters with summary statistics• Summarize motif occurrence in peaks• Irreproducibility Discovery Rate (IDR)
DAF-12 EXAMPLE DATASET
• ChIP-chip peaks were downloaded from GEO at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28350 (Hochbaum, Zhang et al. 2011, PLoS Genet 7(7): e1002179)
• Expression Microarray results were downloaded from (Fisher and Lithgow 2006, Aging Cell 5(2): 127-138).
OVERLAP ANALYSIS AND DISTRIBUTION OF PEAKS AROUND TSS
Replicate1 Replicate2
Replicate3 156
932
12
323
4
148
1
1424
Distance To Nearest TSS
Freq
uenc
y
−10000 −5000 0 5000 10000
05
1015
2025
DISTRIBUTION OF DAF-12-BINDING SITES
downstream
includeFeature
insideoverlapEnd
overlapStart
upstream
Exon% Intron% 5UTR% 3UTR% Proximal Promoter% Immediate Downstream% Enhancer%
Chromosome Region
% b
indi
ng s
ites
010
2030
4050
6070
TOP MOTIFS
m1forward RC
EWSR1−FLI1−
2.2641e−03
IRF1−
2.33e−02
SPIB−
2.3803e−02
m2forward RC
NR2F1+
1.6075e−02
SOX10+
2.4823e−02
HNF4A−
2.7161e−02
m3forward RC
RREB1−
5.2626e−03
Egr1+
1.1045e−02
Foxd3+
1.9366e−02
m4forward RC
SPIB−
2.0655e−02
Tal1::Gata1−
6.646e−02
SOX10+
6.9122e−02
m5forward RC
Pax4−
4.2833e−03
Sox5−
1.6952e−02
NKX3−1−
2.2411e−02
Logo
DAF-12 REGULATORY NETWORK
C02F5.5
mi r -46C33D9.9
C01B10.4
mi r -52
mi r -53
C17A2.4
mi r -228
F01D5.1
srd-16
mb f - 1
mi r -74
mi r -229Y37D8A.5
lsy-2
mi r -75
C50F4.9
mi r -73
hpd -1
F32A5.4
C08E8.3
K10C2.3
Y69E1A.5
mi r -76
htas-1
Y59E9AL.3
dod-24
R10H10.3
sss-1
mi r -240
K09H11.7
pos-1
mi r -43
F55G11.7T01D3.6
cdh-5
T05E12.6
mi r -788
F57G8.7
C33F10.1
F01D5.3
K03H1.12
scl-27
m i r - 2
mi r -242
bre-1
mi r -63nhr -86
hpo-28
Y106G6A.4
F35H8.4
msp-51
clec-66
mi r -72
F44A2.5
ZK1248.17
msp-76
mir-239.1
mi r -87
B0507.10
T11B7.5
pho-1
clec-52
f l i - 1
T05C3.6
nspd-10
Y54G11A.14F21C10.11
Y49G5A.1
F36D1.4
K10D3.6
C27F2.7
nspd-1
ceh-45
meg-2
col-172
z ip -2
tbc -7
mi r -230
wago-9col -19
m i r - 1clec-4
g rd -5
gst -11
dyc-1
nhr -20
T22B7.7
F35E12.5
mi r -243
mi r -44
m ig -5
mi r -80
nhr -42
h lh-30
mi r -40
C24B9.3
i lys-2
mi r -58Y44A6D.1
Y75B8A.23
gst -27
cyp-14A5
col-20
C53A5.2
mi r -34far -3
msp-38
Y67A6A.1
W10C8.5
nhr -10
a t f -6
W01B6.4
ces-2mi r -237
l i n -4
nh r -1
daf -12
grd-10
C12D5.4
F09F7.4
mi r -84
sre-13lnp -1
msp-56
mi r -261
pho-12 T01B11.2
C36C9.1
mi r -42
msp-142le t -7
mi r -795
mi r -37
ug t -6
F11G11.4
ssq-2msp-63
C27D6.3
mi r -51
clec-76
msp-50
fu t - 1
mi r -39
Y40C5A.1
mi r -355
pd i -2
mi r -38
daf -3
mi r -41
l in -13mi r -241
mi r -35
mi r -36
scd-1
C32H11.4
K12H4.7
REFERENCES
• Zhu LJ, Gazin C, Lawson ND, Pagès H, Lin SM, Lapointe DS, Green MR. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics. 2010 May 11; 11:237. PMID: 20459804.
• Zhu LJ. Integrative analysis of ChIP-chip and ChIP-seqdataset. Methods Mol Biol. 2013; 1067:105-24. PMID: 23975789.