epigenomicenrichment...
TRANSCRIPT
EuroBioc 2019 – Brussels
Dario Righelli – PhD
Istituto per le Applicazioni del Calcolo «M. Picone» – CNR - Napoli
[email protected] || [email protected]
Epigenomic enrichmentanalysis using Bioconductor
drighelli
What’s the aim?Compare methods and provide guidelines on epigenomic data analysis
ATAC-seq dataset
Yijing Su et al. 2017 - Nature Neuroscience - Neuronal activity modifies the chromatin accessibility landscape in the adult brain
Before Fear Induction Condition (E0)
After Fear Induction Condition (E1)
}
}
4 biological replicates
4 biological replicates
}Catching differencesin open chromatineregions
ChIP-seq dataset (NULL dataset)
Home Cage Controls - Histon 3, Lysine 9 Acetilation (H3K9ac)
} 9 biological replicates} How many random differences are weable to catch inside a control dataset?
BWA and Bowtie2 perform the same
o Most used aligners for
epigenomics data
o Correlation computed on
ChIP-seq data coverages
o used DeepTools
plotCorrelation tool
o Computed correlations on the
coverages of the same
samples on BWA and Bowtie2
bams have value of 1.
o MACS2 (No Bioconductor)
o Most used peak caller
o Broad and Narrow peaks option
o DEScan2
o Has a peak detector in R
o Peak resolution -> bin size
o Can work with external peaks
o DiffBind
o No peak detection
o Fast on matrix construction
o Uses external peaks
o CSAW
o Starts from BAM files
o Computes matrix of bins x samples
o edgeR
o Widely used method
o Very flexible in usage
A Bioconductor Approach
PeakCallers DESCan2 MACS2 CSAW
NarrowBroad
PeakConsensus &Matrices
DESCan2 DiffBind CSAW
DifferentialEnrichment edgeR
Counts Normalization AffectsDifferentially Accessible Regions (DARs)
o Pay attention to the normalizationprocess
o One tryes to apply a classic RNA-Seqnormalization
o The process does not always give the same results
o Maybe some more specificnormalization is required for this kind of data
ATAC-seq dataset
16652
11982
79567505
6597 6530
4491
976 976654 599 523 514 344 282
0
5000
10000
15000
Inte
rsec
tion
Size
DiffBindBroad
CSAW
DiffBindNarrow
DEScan_Z10_K4_DARs
02000040000Set Size
ATAC−seq DARs
Comparing DARs across methods
o All the methods have the biggest
overlap on the detected peaks
o CSAW and DiffBind show a big amount
of not-overlapping regions
o DEScan2 shows the lowest number of
not-overlapping regions
o The big amount of not-overlapping
regions by CSAW and DiffBind suggests
a possible high-level of false positive
regions detected.
o Ad-hoc designed UpsetPlot on GRanges
o Based on findOverlaps method
Peaks contrasts on NULL dataset show no results
H3K9ac ChIP-seq dataset o Compared performances on a nulldataset of ChIP-seq H3K9ac samples
o Performed 126 permutations of samples
o Samples are randomly divided in two groups
o All the possible permutations on 9 samples (126)
o All the methods find mostly 0 Differential Enriched Peaks on the random conditions.
o Sometimes some differences have beenfound
o With and without normalization
0
2
4
6
8
DEScan2_NoNorm
DEScan2_Norm
DES2_M2_Broa_NoNorm
DES2_M2_Broa_Norm
DES2_M2_Narr_NoNorm
DES2_M2_Narr_Norm
DiffBind_Narr_Norm
DiffBind_Broad_Norm
method
nElem normalized
NOYES
What’s Next?On-going and future works
Some comparisons are still needed
o Compare CSAW on ChIP-seq
o Compare normalization methods with all epigenomics methods
o Explore in-silico biological functions of results
o Testing ATAC—seq Single Cell dataset
• Dr. Claudia Angelini – Istituto per le Applicazioni del Calcolo-CNR
• Dr. Davide Risso – Univeristy of Padua
• Dr. Lucia Peixoto – Elon S. Floyd College of Medicine, Washington State University
• Dr. Timothy Triche Jr. - Van Andel Research Institute
• Dr. Ben Johnson - Van Andel Research Institute
• Thank you for your Attention!
Acknowledgements
http://lists.moo.gs/mailman/listinfo/[email protected]
https://www.facebook.com/pg/NapoliRBiocMeetup
Napoli R/Bioconductor Meetupo Since Nov 2018
o R Consortium Array Group
o At least 25 people any event with a goodturn-over of attendees
o Eight meetups until now
o R Package Creation
o scRNA-seq Analysis
o Differentially Methylated Regions Analysis
o Microscope Image Processing
o Chromosomal Copy Number ChangesDetection
o Bulk RNA-seq Differential Expression
o Hi-C data analysis using HiCeekR
o Metagenomics analysis workflow
Napoli R/Bioconductor Meetup
• Part of a wider idea
• Third city in the World
• Boston (USA)
• New York (USA)
• Napoli (IT)
• Useful to
• share ideas and workflows
• create new collaborations
• extend bioinfo community
Is there a best Aligner?Bowtie2 vs BWA
Comparing DARs across methods (2)
ATAC-seq dataset
73600
60794
24236
1612013994
58544376
2879242421221913185714571046722 703 466 410 351 350 327 241 137 100 93 89 9 5 5 3 20
20000
40000
60000
80000
Inte
rsec
tion
Size
DEScanMACSNarrow
DEScanMACSBroad
DiffBindMACSBroad
DEScanZ10K4
DiffBindMACSNarrow
0250005000075000100000125000Set Size
ATAC−seq Regions Nar/Broa & DEScan2 o Ad-hoc designed UpsetPlot on Granges
o Based on findOverlaps method
o Results description
Duplicates Removal doesn’t impact peak detection
o Diagonal Correlations on
counts matrices show that
there is no big differences
between duplicates and
no-duplicates samples
o rmDup with samtools
o DEScan2 counts matrices
o DiffBind counts matrices
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
noDup_E0_1
noDup_E0_2
noDup_E0_3
noDup_E0_4
noDup_E1_1
noDup_E1_2
noDup_E1_3
noDup_E1_4
withDup_E0_1
withDup_E0_2
withDup_E0_3
withDup_E0_4
withDup_E1_1
withDup_E1_2
withDup_E1_3
withDup_E1_4
DEScan2DEScan2
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
noDup_E0_1
noDup_E0_2
noDup_E0_3
noDup_E0_4
noDup_E1_1
noDup_E1_2
noDup_E1_3
noDup_E1_4
withDup_E0_1
withDup_E0_2
withDup_E0_3
withDup_E0_4
withDup_E1_1
withDup_E1_2
withDup_E1_3
withDup_E1_4
DiffBind
DiffBind
Dup_DEScan2 noDup_DEScan2 Dup_DiffBind noDup_DiffBind
Final Peaks with/without Duplicates
010000
20000
30000
40000
DEScan2 – Differential Enriched Scan 2
• Filter out the peaks with a score lower than a
user-defined threshold
• Aligns the peaks over user-defined number of
samples
• Different thresholds produce different trends
in number of final peaks detected