epigenomicenrichment...

18
EuroBioc 2019 – Brussels Dario Righelli – PhD Istituto per le Applicazioni del Calcolo «M. Picone» – CNR - Napoli [email protected] || [email protected] Epigenomic enrichment analysis using Bioconductor drighelli

Upload: others

Post on 16-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

EuroBioc 2019 – Brussels

Dario Righelli – PhD

Istituto per le Applicazioni del Calcolo «M. Picone» – CNR - Napoli

[email protected] || [email protected]

Epigenomic enrichmentanalysis using Bioconductor

drighelli

Page 2: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

What’s the aim?Compare methods and provide guidelines on epigenomic data analysis

Page 3: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

ATAC-seq dataset

Yijing Su et al. 2017 - Nature Neuroscience - Neuronal activity modifies the chromatin accessibility landscape in the adult brain

Before Fear Induction Condition (E0)

After Fear Induction Condition (E1)

}

}

4 biological replicates

4 biological replicates

}Catching differencesin open chromatineregions

Page 4: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

ChIP-seq dataset (NULL dataset)

Home Cage Controls - Histon 3, Lysine 9 Acetilation (H3K9ac)

} 9 biological replicates} How many random differences are weable to catch inside a control dataset?

Page 5: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

BWA and Bowtie2 perform the same

o Most used aligners for

epigenomics data

o Correlation computed on

ChIP-seq data coverages

o used DeepTools

plotCorrelation tool

o Computed correlations on the

coverages of the same

samples on BWA and Bowtie2

bams have value of 1.

Page 6: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

o MACS2 (No Bioconductor)

o Most used peak caller

o Broad and Narrow peaks option

o DEScan2

o Has a peak detector in R

o Peak resolution -> bin size

o Can work with external peaks

o DiffBind

o No peak detection

o Fast on matrix construction

o Uses external peaks

o CSAW

o Starts from BAM files

o Computes matrix of bins x samples

o edgeR

o Widely used method

o Very flexible in usage

A Bioconductor Approach

PeakCallers DESCan2 MACS2 CSAW

NarrowBroad

PeakConsensus &Matrices

DESCan2 DiffBind CSAW

DifferentialEnrichment edgeR

Page 7: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

Counts Normalization AffectsDifferentially Accessible Regions (DARs)

o Pay attention to the normalizationprocess

o One tryes to apply a classic RNA-Seqnormalization

o The process does not always give the same results

o Maybe some more specificnormalization is required for this kind of data

ATAC-seq dataset

Page 8: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

16652

11982

79567505

6597 6530

4491

976 976654 599 523 514 344 282

0

5000

10000

15000

Inte

rsec

tion

Size

DiffBindBroad

CSAW

DiffBindNarrow

DEScan_Z10_K4_DARs

02000040000Set Size

ATAC−seq DARs

Comparing DARs across methods

o All the methods have the biggest

overlap on the detected peaks

o CSAW and DiffBind show a big amount

of not-overlapping regions

o DEScan2 shows the lowest number of

not-overlapping regions

o The big amount of not-overlapping

regions by CSAW and DiffBind suggests

a possible high-level of false positive

regions detected.

o Ad-hoc designed UpsetPlot on GRanges

o Based on findOverlaps method

Page 9: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

Peaks contrasts on NULL dataset show no results

H3K9ac ChIP-seq dataset o Compared performances on a nulldataset of ChIP-seq H3K9ac samples

o Performed 126 permutations of samples

o Samples are randomly divided in two groups

o All the possible permutations on 9 samples (126)

o All the methods find mostly 0 Differential Enriched Peaks on the random conditions.

o Sometimes some differences have beenfound

o With and without normalization

0

2

4

6

8

DEScan2_NoNorm

DEScan2_Norm

DES2_M2_Broa_NoNorm

DES2_M2_Broa_Norm

DES2_M2_Narr_NoNorm

DES2_M2_Narr_Norm

DiffBind_Narr_Norm

DiffBind_Broad_Norm

method

nElem normalized

NOYES

Page 10: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

What’s Next?On-going and future works

Page 11: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

Some comparisons are still needed

o Compare CSAW on ChIP-seq

o Compare normalization methods with all epigenomics methods

o Explore in-silico biological functions of results

o Testing ATAC—seq Single Cell dataset

Page 12: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

• Dr. Claudia Angelini – Istituto per le Applicazioni del Calcolo-CNR

• Dr. Davide Risso – Univeristy of Padua

• Dr. Lucia Peixoto – Elon S. Floyd College of Medicine, Washington State University

• Dr. Timothy Triche Jr. - Van Andel Research Institute

• Dr. Ben Johnson - Van Andel Research Institute

• Thank you for your Attention!

Acknowledgements

Page 13: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

http://lists.moo.gs/mailman/listinfo/[email protected]

https://www.facebook.com/pg/NapoliRBiocMeetup

Napoli R/Bioconductor Meetupo Since Nov 2018

o R Consortium Array Group

o At least 25 people any event with a goodturn-over of attendees

o Eight meetups until now

o R Package Creation

o scRNA-seq Analysis

o Differentially Methylated Regions Analysis

o Microscope Image Processing

o Chromosomal Copy Number ChangesDetection

o Bulk RNA-seq Differential Expression

o Hi-C data analysis using HiCeekR

o Metagenomics analysis workflow

Page 14: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

Napoli R/Bioconductor Meetup

• Part of a wider idea

• Third city in the World

• Boston (USA)

• New York (USA)

• Napoli (IT)

• Useful to

• share ideas and workflows

• create new collaborations

• extend bioinfo community

Page 15: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

Is there a best Aligner?Bowtie2 vs BWA

Page 16: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

Comparing DARs across methods (2)

ATAC-seq dataset

73600

60794

24236

1612013994

58544376

2879242421221913185714571046722 703 466 410 351 350 327 241 137 100 93 89 9 5 5 3 20

20000

40000

60000

80000

Inte

rsec

tion

Size

DEScanMACSNarrow

DEScanMACSBroad

DiffBindMACSBroad

DEScanZ10K4

DiffBindMACSNarrow

0250005000075000100000125000Set Size

ATAC−seq Regions Nar/Broa & DEScan2 o Ad-hoc designed UpsetPlot on Granges

o Based on findOverlaps method

o Results description

Page 17: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

Duplicates Removal doesn’t impact peak detection

o Diagonal Correlations on

counts matrices show that

there is no big differences

between duplicates and

no-duplicates samples

o rmDup with samtools

o DEScan2 counts matrices

o DiffBind counts matrices

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

noDup_E0_1

noDup_E0_2

noDup_E0_3

noDup_E0_4

noDup_E1_1

noDup_E1_2

noDup_E1_3

noDup_E1_4

withDup_E0_1

withDup_E0_2

withDup_E0_3

withDup_E0_4

withDup_E1_1

withDup_E1_2

withDup_E1_3

withDup_E1_4

DEScan2DEScan2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

noDup_E0_1

noDup_E0_2

noDup_E0_3

noDup_E0_4

noDup_E1_1

noDup_E1_2

noDup_E1_3

noDup_E1_4

withDup_E0_1

withDup_E0_2

withDup_E0_3

withDup_E0_4

withDup_E1_1

withDup_E1_2

withDup_E1_3

withDup_E1_4

DiffBind

DiffBind

Dup_DEScan2 noDup_DEScan2 Dup_DiffBind noDup_DiffBind

Final Peaks with/without Duplicates

010000

20000

30000

40000

Page 18: Epigenomicenrichment analysisusingBioconductoreurobioc2019.bioconductor.org/slides/Flashlight4/Righelli_Bioc2019.… · 654 599 523 514 344 282 0 5000 10000 15000 Intersection Size

DEScan2 – Differential Enriched Scan 2

• Filter out the peaks with a score lower than a

user-defined threshold

• Aligns the peaks over user-defined number of

samples

• Different thresholds produce different trends

in number of final peaks detected