analysis of chip-seq experiments -...

29
Analysis of ChIP-seq experiments Jan. 13, 2011 Hot Topics: Analysis of ChIP-seq experiments 1

Upload: ngodat

Post on 12-May-2019

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Analysis of ChIP-seq experiments

Jan. 13, 2011

Hot Topics: Analysis of ChIP-seq experiments 1

Page 2: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Hot Topics on analysis high throughput sequencing experiments

• Mapping next generation sequence reads (December 2010 )

http://iona.wi.mit.edu/bio/education/hot_topics/shortRead_mapping/Mapping_HTseq.pdf

• ChIP-seq (January 2011)

• RNA-seq (February 2011)

• High throughput sequencing pipeline in Galaxy (March 2011)

2

Page 3: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Outline

• ChIP-seq overview

• Critical steps in data analysis

• Tools available for analysis and software performance

– BaRC bake-off

– Published evaluations

• Software available on Tak

• Suggested pipelines

3

Page 4: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

ChIP-Seq overview I

4

Park, P. J., ChIP-seq: advantages and challenges of a maturing technology, Nat Rev Genet. Oct;10(10):669-80 (2009)

Page 5: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

ChIP-Seq overview II

5

Park, P. J., ChIP-seq: advantages and challenges of a maturing technology, Nat Rev Genet. Oct;10(10):669-80 (2009)

Page 6: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Critical steps in data analysis

1. Effective mapping

2. Read extension and signal profile generation

3. Peak assignment Most original software looked for fold enrichment of the sample over input or expected background, and used a Poisson distribution to assess the significance of the fold enrichment Newer versions a. Use of strand dependent bimodality b. Use background distribution from

input DNA or model background data to adjust for local variation

6

Pepke, S. et al. Computation for ChIP-seq and RNA-seq studies, Nat Methods. Nov (2009)

Page 7: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Critical steps in data analysis Mapping your reads

• See Hot Topic: Mapping Next Generation Sequence Reads (December 2010 )

http://iona.wi.mit.edu/bio/education/hot_topics/shortRead_mapping/Mapping_HTseq.pdf

• If a read maps to several places on the genome, keep one position

at random to avoid over counting the reads • Example of mapping command bsub "bowtie -k 1 -n 2 -l 36 --best --solexa1.3-quals

/nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 inputSeq.txt output_mm9.k1.n2.l36.best.map"

-k 1: report 1 alignment per read -n 2: max number of mismatches in the seed -l 36: seed length --best: hits guaranteed best stratum; ties broken by quality --solexa1.3-quals: input quals are from GA Pipeline ver. >= 1.3

7

Page 8: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Critical steps in data analysis

Peak calling Regions that may occur in ChIP-seq data for TFs

8

Rye M. B. et al. A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs N.A.R. Nov (2010)

Page 9: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Critical steps in data analysis

Peak calling The value of having input control

9

Sample wig file

Control wig file

Page 10: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

10

Directional methods (i.e. SISSRs) look for the point where reads shift from mapping to

the sense strand to mapping to the antisense strand.

These methods can be very precise for sharp binding but they are less useful for

identifying broad enrichment signals where the shift point is not present anymore

Critical steps in data analysis

Peak calling Using strand dependent bimodality in peak calling

Wilbanks, E.G. et al. Evaluation of Algorithm Performance in ChIP-Seq Peak Detection . PLoS ONE July (2010)

10

Sharp binding Broad binding

Page 11: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Outline

• ChIP-seq overview

• Critical steps in data analysis

• Tools available for analysis and software performance

– BaRC bake-off

– Published evaluations

• Software available on Tak

• Suggested pipelines

11

Page 12: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

12

Pepke, S. Wold, B. Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat Methods. Nov (2009).

Software packages we have tried

Page 13: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

BaRC ChIP-seq bakeoff

• FindPeaks Doesn’t take control data

• PeakSeq Input has to be Eland output

• ERANGE Running time is too long

• CisGenome Easy to use GUI available

• SISSRs Recommended for TFs

• MACS Recommended for TFs and Histone modifications

Data used: Marson et al., Cell. 2008 Aug 8;134(3):521-33, (Young lab)

13

Page 14: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

BaRC ChIP-seq bakeoff Comparison between programs

Cisgenome v1.1 (9403)

SISSRs v1.4 (10933)

MACS 1.3.7.1 top ~17K

(out of ~30K) Marson et. al

(16688)

Cisgenome NA - - -

SISSRs 83.78 NA - -

MACS top 17K 97.31 96.91 NA -

Marson et. al 96.78 96.22 84.01 NA

14

Pair-wise comparisons of the peaks for Nanog. Numbers represent the percentage of total peaks from one method (column) that are shared with another method (row).

Data used: Marson et al., Cell. 2008 Aug 8;134(3):521-33, (Young lab)

Page 15: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Outline

• ChIP-seq overview

• Critical steps in data analysis

• Tools available for analysis and software performance

– BaRC bake-off

– Published evaluations

• Software available on Tak

• Suggested pipelines

15

Page 16: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Other evaluations of ChIP-seq peak

calling programs

16 “Evaluation of Algorithm Performance in ChIP-Seq peak Detection (PLoS ONE, July 2010)”

Page 17: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Benchmarking ChIP-seq peak calling algorithms

• Agreement between different programs

• Co-occurrence of binding motifs

• Experimental verification (still small amount of data)

17

Page 18: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Agreement between different programs

Programs that call a larger number of peaks tend to include the peaks by the programs calling fewer peaks

18 “Evaluation of Algorithm Performance in ChIP-Seq peak Detection (PLoS ONE, July 2010)”

Pair-wise comparison of shared peaks for NRSF human neuron-restrictive silencer factor (NRSF) and growth-associated binding protein (GABP) Numbers represent the percentage of total peaks from one method (column) that is shared with another method (row). Programs are ordered by increasing number of peaks called.

Page 19: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Evaluation based on number of peaks containing the expected motif

PeakSeq and Hpeak are outliers

19 Evaluation of Algorithm Performance in ChIP-Seq peak Detection (PLoS ONE, July 2010)

Page 20: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Recommendations

• Include an DNA-input control

• Look at your raw data as well as the peak calls in a genome browser

• If your data/signal is not very strong try using several peak call programs.

• We have had good results using MACs. SISSRs is a good second choice if you are expecting sharp peaks.

20

Page 21: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Software available on Tak

• MACS

macs

• SISSRs

sissrs

21

Page 22: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Outline

• ChIP-seq overview

• Critical steps in data analysis

• Tools available for analysis and software performance

– BaRC bake-off

– Published evaluations

• Software available on Tak

• Suggested pipelines

22

Page 23: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

A pipeline for ChIP-seq analysis with MACS

• Mapping reads with bowtie bsub "bowtie -k 1 -n 2 -l 36 --best --solexa1.3-quals

/nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 inputSeq.txt output_mm9.k1.n2.l36.best.map “

• Calling peaks with MACS bsub "macs -t sample_mm9.k1.n2.l36.best.map -c inputControl_mm9.k1.n2.l36.best.map --name=test1 --

format=BOWTIE --tsize=36 --wig --space=25 --mfold=10,30"

PARAMETERS

• -t TFILE Treatment file

• -c CFILE Control file

• –name=NAME Experiment name, which will be used to generate output file names. DEFAULT: “NA”

• –format=FORMAT Format of tag file, “BED” or “ELAND” or “ELANDMULTI” or “ELANDMULTIPET” or “SAM” or “BAM” or “BOWTIE”. DEFAULT: “BED”

• –tsize=TSIZE Tag size. DEFAULT: 25

• –wig: Whether or not to save shifted raw tag count at every bp into a wiggle file

• –mfold=MFOLD Select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit. DEFAULT:10,30

23

Page 24: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

MACs output files

1. Folder with wig files for control and sample.

2. Excel file containing the following columns: chr

start

end

length

summit

tags

“-10*LOG10(pvalue)”

fold_enrichment

FDR(%)

To visualize the peaks make a bedgraph file with columns: chr start end fold_enrichment

24

Page 25: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

A pipeline for ChIP-seq analysis with SISSRs

• Mapping reads with bowtie, get sam output bsub "bowtie -k 1 -n 2 -l 36 --sam --best --solexa1.3-quals /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9

inputSeq.txt bowtieoutput_mm9.k1.n2.l36.best.sam “

• Convert formats Convert SAM to BAM samtools view -S -b -o OUTFileName.bam INFile.sam

bsub "samtools view -S -b -o bowtieoutput_mm9.k1.n2.l36.best.bam bowtieoutput_mm9.k1.n2.l36.best.sam "

Convert to BED bsub "bamToBed -i bowtieoutput_mm9.k1.n2.l36.best.bam >

bowtieoutput_mm9.k1.n2.l36.best.bed "

• Run SISSRs sissrs -i bowtieoutput_mm9.k1.n2.l36.best.bed -o outputName -s 2716965481 -b

BG_mm9.k1.n2.l36.best.bed -L 200 -s genome size -L upper-bound on the DNA fragment length

25

Page 26: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

SISSRs output

outputName.bed

chr1 3053011 3053071 outputName 55.54 .

chr1 3333731 3333791 outputName 12.62 .

Convert it to bedgraph

outputName.bedgraph

chr1 3053011 3053071 55.54

chr1 3333731 3333791 12.62

26

Page 27: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Visualization in IGV

27

MACS MACS

SISSRs

Sample wig file

BaRC’s bake off peak calls from MACS and SISSRs Data from Marson et al., Cell. 2008

http://www.broadinstitute.org/software/igv/

Control wig file

Page 28: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

References

• Reviews and benchmark papers: – ChIP-seq: advantages and challenges of a maturing technology (Oct 09)

(http://www.nature.com/nrg/journal/v10/n10/full/nrg2641.html) – Computation for ChIP-seq and RNA-seq studies (Nov 09)

(http://www.nature.com/nmeth/journal/v6/n11s/full/nmeth.1371.html) – Evaluation of Algorithm Performance in ChIP-Seq peak Detection (July

2010)(http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011471) – Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing benchmarking efforts (Oct

2010)(http://bib.oxfordjournals.org/content/early/2010/11/08/bib.bbq068.full) – A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder

programs (NAR, Nov 2010) (PMID: 21113027 )

• MACs: Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) vol. 9 (9) pp. R137 http://liulab.dfci.harvard.edu/MACS/index.html • SISSRs: (Site Identification from Short Sequence Reads) Raja et al. Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data. NAR (2008) 36 (16): 5221-5231.

http://wiki.bioinformatics.ucdavis.edu/index.php/Bioinformatics_Course_Sissrs • GPS (Genome Positioning System): Guo et al. Discovering homotypic binding events at high spatial resolution. Bioinformatics (2010) 26

(24). Dave Gifford’s group.

28

Page 29: Analysis of ChIP-seq experiments - …barc.wi.mit.edu/education/hot_topics/ChIPseq/ChIPSeq_Hot...2011/01/13 · Hot Topics: Analysis of ChIP-seq experiments 1 Hot Topics on analysis

Hot Topics slides

29