genome-wide dna methylation analysis bi-qing li key laboratory of systems biology, shanghai...

Post on 18-Dec-2015

221 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Genome-wide DNA methylation analysis

Bi-Qing LiKey Laboratory of Systems biology,

Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences

outlineBackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA

methylation analysisIllumina BS-seq data manipulation

BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA

methylation analysisIllumina BS-seq data manipulation

Background DNA methylation is the main covalent chemical modification

of DNA involved in a variety of biological processes, including embryogenesis and development, silencing of transposable elements, regulation of gene transcription and tumorigenesis and progression.

The methylation pattern of DNA is highly variable among cells types and developmental stages and influenced by disease processes and genetic factors, which brings considerable theoretical and technological challenges for its comprehensive analysis.

Recently various high-throughput approaches have been developed and applied for the genome wide analysis of DNA methylation providing single base pair resolution, quantitative DNA methylation data with genome wide coverage.

Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085

BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA

methylation analysisIllumina BS-seq data manipulation

Method to distinguish 5mC

Biotechniques. 2010 Oct;49(4):iii-xi

Restriction endonuclease-based analysis

Pu: A or G, mC: 5-methylcytosine or 5-hydroxymethylcytosine or N4-methylcytosine , These half-sites can be separated by up to 3 kb, but the optimal separation is 55-103 base pairs

Cut unmethylated DNA Regardless of methylation

Cut unmethylated DNA Partially affacted by CpG methylation

Cut methylated DNA

isoschizomer

neoschizomer

Biotechniques. 2010 Oct;49(4):iii-xi

Restriction endonuclease-based analysisMethylation-sensitive restriction digestion followed by PCR

across the restriction site is a very sensitive technique that is still used in some applications today.

This method is still applicable for some locus-specific studies that require linkage of DNA methylation information across multiple kilobases, either between CpGs or between a CpG and a genetic polymorphism.

Limited by providing methylation data only at the restriction enzyme recognition sites or adjacent regions

It is extremely prone to false-positive results caused by incomplete digestion for reasons other than DNA methylation.

Nat Rev Genet. 2010 Feb 2;11(3):191-203

Bisulfite conversion of DNA

Proc Natl Acad Sci U S A. 1992 Mar 1;89(5):1827-31.

Bisulfite conversion

PCR

Bisulfite conversion of DNASingle base pair resolution, no bias

DNA degradation by high temperature and low PH

Incomplete conversion of unmethylated cytosine

High GC density regions

Protected by histones

Stable secondary structure elements

Reduced complexity of genome, greater sequence redundancy, decreased hybridization specificity

Difficult to mapping (repetitive regions)

Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085

Immunoprecipitation-based methodsmethylated DNA immunoprecipitation (MeDIP-seq)

Antibody recognizes 5mc to pull down the methylated fraction of genome

More sensitive to highly methylated, intermediate-CpG density regions

methyl-binding domain protein (MBD-seq)

Using the methyl-binding protein MeCP2 or MBD2’s affinity for CpGs

More sensitive to highly methylated, high-CpG density regions

Methods. 2010 Nov;52(3):203-12

Immunoprecipitation-based methodsStraitforward and data relatively easier to analyze

Bias associated with CpG density and need adjustment

High(MBD) or intermediate(MeDIP) CpG dense regions will be interpreted as “more methylated” than equally methylated low-CpG density regions

Low resolution, do not yield information on individual CpG dinucleotides

Methods. 2010 Nov;52(3):203-12

BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA

methylation analysisIllumina BS-seq data manipulation

Array-based genome wide DNA methylation analysis & restriction endonuclease

Digestion of one pool of genomic DNA with a methylation-sensitive restriction enzyme and mock digestion of another pool or using two different enzymes

Two DNA pools are amplified and labelled with different fluorescent dyes for two-color

Array hybridization

Nat Rev Genet. 2010 Feb 2;11(3):191-203

Array-based genome wide DNA methylation analysis & restriction endonuclease

Comprehensive high-throughput arrays for relative methylation (CHARM)

McrBC fractionate unmethylated DNA

Label methyl-depleted DNA with Cy5 and total DNA with Cy3

Hybridized on high density arrays

Genome Res. 2008 May;18(5):780-90

Cut methylated DNA

Array-based genome wide DNA methylation analysis & restriction endonuclease

Digestion genomic DNA with HpaII and MspI

Ligation-mediated PCR for the amplification of HpaII or MspI genomic restriction fragments

Label HpaII amplified with Cy5 and MspI with Cy3

Array hybridization

Genome Res. 2006 Aug;16(8):1046-55

HpaII tiny fragment enrichment by ligation mediatedPCR (HELP)

Cut unmethylated DNA

Regardless of methylation

Array-based genome wide DNA methylation analysis & methylation immunoprecipitation

Enrichment of methylated fragments using 5mC antibody or the affinity of methyl-binding proteins

Input DNA and enriched DNA are labeled with different fluorescent dyes

Array hybridization

Nat Rev Genet. 2010 Feb 2;11(3):191-203

Array-based genome wide DNA methylation analysis & methylation immunoprecipitation

Methylated DNA immunoprecipitationFrom Wikipedia, the free encyclopedia

Array-based genome wide DNA methylation analysis & bisulfite conversion

ILLUMINA® EPIGENETIC ANALYSIS

Array-based genome wide DNA methylation analysis & bisulfite conversion

27,578 CpG sites

14,495 protein-coding gene promoters

110 microRNA gene promoters Nat Rev Genet. 2010 Feb 2;11(3):191-203

Array-based genome wide DNA methylation analysis & bisulfite conversion

Genome Res. 2006 Mar;16(3):383-93

Array-based genome wide DNA methylation analysis & bisulfite conversion

GoldenGate BeadArray 1536 specific CpG site in 371 geneGoldenGate Methylation Cancer Panel I 1505 CpG sites selected from 807 genes

Nat Rev Genet. 2010 Feb 2;11(3):191-203

Illumina® Epigenetics Analysis

Array-based genome wide DNA methylation analysis

Easy to perform such experimentsEasy to interpret data with many well-characterized

software programsLow resolutionNot easy to distinguish one repetitive element from

another in a hybridization-based methodNot truly genome-wide

BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA

methylation analysisIllumina BS-seq data manipulation

NGS based genome-wide DNA methylation analysis

Biotechniques. 2010 Oct;49(4):iii-xi

NGS based genome-wide DNA methylation analysis-ROCHE 454

Roche/454 pyrosequencing-based massively parallel bisulfite pyrosequencing

Include more CpG sites facilitating complex methylation pattern research

Easier and more accurately aligned to reference, especially in repetitive regions

Bigger chance to cover more genotype information (SNP) adjacent to cytosine

Relatively high sequencing costHigher error rates in calling identical bases

Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085

NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA

Methyl-seq

~100-350bp

Illumina Genome Analyzer II

Genome Res. 2009 Jun;19(6):1044-56

Cut unmethylated DNA

Regardless of methylation

NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA

Methyl-sensitive cut counting(MSCC)

Nat Biotechnol. 2009 Apr;27(4):361-8

The method is similar to Methyl-Seq; however, sequencing of MspI libraries was reported to have little effect on the measurement of methylation and was abolished to reduce costs.

Genome Med. 2009 Nov 16;1(11):106

NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA

methyl-DNA immunoprecipitation(MeDIP) seq

Methods. 2009 Mar;47(3):142-50

NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA

Reduced representation bisulfite sequencing(RRBS)

Nucleic Acids Research, 2005, Vol. 33, No. 18 Nature. 2008 Aug 7;454(7205):766-70Nat Methods. 2010 Feb;7(2):133-6

Illumina Genome Analyzer

NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA

Bisulfite padlock probes(BSPPs)

Nat Biotechnol. 2009 Apr;27(4):353-60

NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA

Bisulfite sequencing(BS-seq)

Nature. 2008 Mar 13;452(7184):215-9

NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA

Cytosine methylome sequencing(MethylC-seq)

Cell. 2008 May 2;133(3):523-36

Nature. 2009 Nov 19;462(7271):315-22

Nature. 2011 Mar 3;471(7336):68-73

BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA

methylation analysisIllumina BS-seq data manipulation

Third generation sequencing based genome-wide DNA methylation analysis-PacBio

single-molecule, real-time sequencing (SMRT)

ZMW: zero mode waveguide Nat Biotechnol. 2010 May;28(5):426-8

Third generation sequencing based genome-wide DNA methylation analysis-PacBio

single-molecule, real-time sequencing (SMRT)

Nat Methods. 2010 Jun;7(6):461-5 Nat Methods. 2010 Jun;7(6):435-7

Third generation sequencing based genome-wide DNA methylation analysis-Oxford Nanopore

Oxford Nanopore Technologies

Nat Biotechnol. 2010 May;28(5):426-8

BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA

methylation analysisIllumina BS-seq data manipulation

Illumina BS-seq data manipulation

FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application

Illumina BS-seq data manipulation

FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application

Illumina BS-seq data manipulationFASTQ file format

FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score

Nucleic Acids Research, 2010, Vol. 38, No. 6 1767–1771

Illumina BS-seq data manipulationPHRED score

Nucleic Acids Research, 2010, Vol. 38, No. 6 1767–1771

Nature. 2009 Nov 19;462(7271):315-22

Illumina BS-seq data manipulationPHRED score

http://en.wikipedia.org/wiki/FASTQ_format#cite_note-Illumina_User_Guide_1.5-2

Illumina BS-seq data manipulation

FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application

Illumina BS-seq data manipulationadaptor trimming with FASTX

Nature. 2009 Nov 19;462(7271):315-22

Illumina BS-seq data manipulationadaptor trimming with FASTX

http://hannonlab.cshl.edu/fastx_toolkit/index.html

Illumina BS-seq data manipulationadaptor trimming with FASTX

http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastx_clipper_usage

Illumina BS-seq data manipulation

FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application

Illumina BS-seq data manipulationQuality control with FastQC

http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

Illumina BS-seq data manipulationQuality control with FastQC

Illumina BS-seq data manipulation Quality control with FastQC

Illumina BS-seq data manipulation

FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application

Illumina BS-seq data manipulationReads filter and trimming with FASTX

http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_filter_usage

e.g.1 fastq_quality_filter -Q 33 -q 20 -p 100 -v -i input -o output

e.g.2 fastq_quality_filter -q 10 -p 100 -i /usr/local/data/GBS/OWB-RAD1.fastq -Q 33 | fastq_quality_filter -Q 33-q 20 -p 80 -o OWB1-filt.fastq

Illumina BS-seq data manipulationReads filter and trimming with FASTX

FASTQ quality trimmer

e.g.1 fastq_quality_trimmer -t 20 -l 35 -v -i input -o output

Illumina BS-seq data manipulation

FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application

Illumina BS-seq data manipulationReads mapping with Bismark

Illumina BS-seq data manipulation Reads mapping with Bismark

Bioinformatics. 2011 Jun 1;27(11):1571-2.

Two computationally converted reference

Bioinformatics. 2011 Jun 1;27(11):1571-2.

Illumina BS-seq data manipulationReads mapping with Bismark

Illumina BS-seq data manipulation Reads mapping with Bismark

Illumina BS-seq data manipulation Reads mapping with Bismark

H=A, C or T

Illumina BS-seq data manipulationReads mapping with Bismark

H=A, C or T

Illumina BS-seq data manipulationReads mapping with Bismark

H=A, C or T

Illumina BS-seq data manipulationReads mapping with Bismark

Illumina BS-seq data manipulationReads mapping with Bismark

H=A, C or T

Illumina BS-seq data manipulationReads mapping with Bismark

chromosome position strand context mC All C

1 468 + CG 4 4

1 469 - CG 5 6

1 470 + CG 5 5

1 471 - CG 7 7

1 7384 - CHG 6 9

1 225896 - CHH 4 16

1 771455 + CHH 5 22

1 702235 + CHG 2 12

Illumina BS-seq data manipulation

FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application

Illumina BS-seq data manipulationBasic analysis-Reads coverage

Illumina BS-seq data manipulationBasic analysis-Reads depth

Illumina BS-seq data manipulationBasic analysis-Reads depth percentage

Illumina BS-seq data manipulationBasic analysis- Methylation level

number of methylated readsmethylationlevel

number of methylated reads number of unmethylated reads

chromosome position strand context mC All C Methylationlevel

1 468 + CG 4 4 100%

1 469 - CG 5 6 83.3%

1 470 + CG 5 5 100%

1 471 - CG 7 7 100%

1 7384 - CHG 6 9 66.7%

1 225896 - CHH 4 16 25%

1 771455 + CHH 5 22 22.7%

1 702235 + CHG 2 12 16.7%

H=A, C or T

Illumina BS-seq data manipulationBasic analysis-Methylaion density

( , , )( )

( , , )( )

number of calls of a givenmethylationtype mCG mCHG mCHHAbsolute mC

bin size

mC number of calls of a givenmethylationtype mCG mCHG mCHHRelativemethylation

C total number of sites of the sametype

H=A, C or T

Illumina BS-seq data manipulation

FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application

Illumina BS-seq data manipulationAdvanced analysis and application

DNA methylation and gene expression

DNA methylation is linked to gene silencing and is considered to be an important mechanism in the regulation of gene expression

Gene expression

Gene expression microarray

RNA-seq

Illumina BS-seq data manipulationAdvanced analysis and application

DNA methylation and gene expression

proximal TSS (-150 bp to +150 bp across TSS)

Promoter (1.5 kb upstream of the TSS)

Nature. 2009 Nov 19;462(7271):315-22

Genome Res. 2010 Mar;20(3):320-31.

Illumina BS-seq data manipulationAdvanced analysis and application

DNA methylation and gene expression

Illumina BS-seq data manipulationAdvanced analysis and application

Differentially methylated region(DMRs) and gene expression

DNA methylation at DNA–protein interaction sitesDNA methylation, miRNA, and histone modification……

Nature. 2009 Nov 19;462(7271):315-22

Genome Res. 2010 Mar;20(3):320-31.

Thank you!

top related