next gen sequencing workshop - · pdf filenext gen sequencing workshop genome quebec and...

41
Next Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues [email protected] [email protected] December 2012

Upload: vuongnguyet

Post on 19-Mar-2018

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Next Gen Sequencing Workshop

Genome Quebec and Mcgill Innovation Center

Maxime Caronand colleagues

[email protected]@genome.mcgill.ca

December 2012

Page 2: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Goals

Cover multiple types of datasets to get a broad overview of next gen sequencing

RNA-seq ChIP-seq Methyl-seq

Get familiar with compute canada clusters

Get familiar with working under linux

Page 3: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Cluster

Login node(s)

Execution nodes

...

1 2 3 4 5

Data

echo “command” | qsub -q qtest@mp2 -V -l walltime=1:00:0 -l nodes=1:ppn=1 -j oe -o jobs_output

qstat -u user

Page 4: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Initial setup

ssh to [email protected]

export SCRIPT=/home/uwy-034-aa_group/gq_workshop/Rnaseq.sh

Page 5: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

RNA-seq

Rna content of a cell ~ 85% rRNA ~ 10% tRNA ~ 5% mRNA, miRNA, snRNA...

Generate cDNA fragment

Map to genome considering splice sites

2 conditions, 2 replicates per condition

Page 6: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Fastqs (filtered)$SCRIPT setup

head -8 $MAIN/rnaseq/datasets/sample1.r1.fastq

head -8 $MAIN/rnaseq/datasets/sample1.r2.fastq

@DD63XKN1:218:D106TACXX:8:1107:6686:72194/1CGGGCGCATGCGGTCGGGGTTGTTCACTGGCTGTCCGGGGCTCCGCGCGCGTCGCCGGCCCAGCTCTGTCGCTGACGGGAGGATCTGAAGCCGGCAGA+CCCFFFFFHHHHHHIJJJJFHIIJJJJJJJJJJJJHHHFDDBBDBDDBBBB59B@BBDBBDDDB@C>C>@8ABB<ABBD9@B0>?CC@CA@BDDDD<?@DD63XKN1:218:D106TACXX:8:2305:7542:153839/1CCAGAATGTACTTCATTTTACTCTTTGACCTGCGGCCGGCTTCAGATCCTCCCGTCAGCGACAGAGCTGGGCCGGCGACGCGCGCGGAGCCCCGGACAGC+C@CFFFFFHFHHHJJJJIJJJJJGIIJGGJJIJIIJJJIIGIEIGGIJJJJIJGCHHFFFDDD@CCCDDDDDDDDDDDDDDD>BBDBBDDDDDDDBB>BA

@DD63XKN1:218:D106TACXX:8:1107:6686:72194/2GCCGGCTTCAGATCCTCCCGTCAGCGACAGAGCTGGGCCGGCGACGCGCGCGGAGCCCCGGACAGCCAGTGAACAACCCCGACCGCATGCGCCCGAGA+@C@FFFFFHGDHHJJIIIJIGIIGJIIJGGIHIFIIJJJGIIFC>?B@8B9BBDDDDDD@BDDDBBDDDCDDDCCDBBD@>BBDDDD9@CDDB@B<BB@DD63XKN1:218:D106TACXX:8:2305:7542:153839/2GAAAGACTGCCGGGCGCATGCGGTCGGGGTTGTTCACTGGCTGTCCGGGGCTCCGCGCGCGTCGCCGGCCCAGCTCTGTCGCTGACGGGGGGATCTGAAG+C@CFFFFFHHHHGJEEGHJJJJJ@FHIIJ@EHBBEFFFFEDEDADC?B@B-5<<695555&)008?5>5@D?B8:C3>>><<9>0?9@D&5>&&2>C:C@

R1

R2

Page 7: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Alignment$SCRIPT align

$SOFTWARE/samtools-0.1.18/samtools view $OUTDIR/alignment/sample1/accepted_hits.bam | head -4

DD63XKN1:218:D106TACXX:8:2305:20250:155750 99 chr1 41445351 255 99M = 41445453 3649CTCCAACCTCTGCGTGCGCACAGCCTAGAGCCCGCCTCCGTGAAAGACTGCCGGGCGCATGCGGTCGGGGTTGTTCACTGGCTGTCCGGGGCTCCGCGCCCCFFFFFHHHHHJJJJJJJJJHIJJJIJJJJJJIJIJJJJJJJJJJHHGHHFFDDDDDDDDDD>BDDDD>;BDBDEDDDDDDDDDDBDDD9<BBB<B@ NM:i:0 NH:i:1DD63XKN1:218:D106TACXX:8:2302:7499:179658 99 chr1 41445386 255 98M = 41445437 3598CTCCGTGAAAGACTGCCGGGCGCATGCGGTCGGGGTTGTTCACTGGCTGTCCGGGGCTCCGCGCGCGTCGCCGGCCCAGCTCTGTCGCTGACGGGAGGCCCFFFFFHHHHHJJJJJJJJJIJJIIJJFHIJJH=BEACDEEDDDDDDBDDBDDDDBBDDDD@>@>7@DBDBBB@BDBBCDCCCDDB@BBDDDD5@B NM:i:0NH:i:1DD63XKN1:218:D106TACXX:8:2204:16084:2152 97 chr1 41445386 255 53M = 41448985 3692CTCCGTGAAAGACTGCCGGGCGCATGCGGTCGGGGTTGTTCACTGGCTGTCCG =?=ADB??DFH?FHIIDAFB1CGHAAFEFFG<FGE/==3=A>AC;?=3;@@>?

NM:i:0 NH:i:1DD63XKN1:218:D106TACXX:8:2305:7542:153839 163 chr1 41445392 255 100M = 41445431 3587GAAAGACTGCCGGGCGCATGCGGTCGGGGTTGTTCACTGGCTGTCCGGGGCTCCGCGCGCGTCGCCGGCCCAGCTCTGTCGCTGACGGGGGGATCTGAAG

C@CFFFFFHHHHGJEEGHJJJJJ@FHIIJ@EHBBEFFFFEDEDADC?B@B-5<<695555&)008?5>5@D?B8:C3>>><<9>0?9@D&5>&&2>C:C@ NM:i:1NH:i:1

http://picard.sourceforge.net/explain-flags.html

Page 8: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Alignment (2)chr1:41444963-41478280

Page 9: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Alignment (3)

Page 10: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Cufflinks$SCRIPT fpkmknown

Quantifies transcript abundance

FPKM: fragments per kilobase per million reads

tail $OUTDIR/cufflinks_known/sample1/isoforms.fpkm_tracking

tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_statusENST00000331595 0 0 ENSG00000182492 BGN 0 chrX:152760396-152775012 2402 0 0 0 0 OKENST00000472615 0 0 ENSG00000182492 BGN 0 chrX:152760438-152775004 2225 0 0 0 0 OKENST00000480756 0 0 ENSG00000182492 BGN 0 chrX:152760440-152775004 2278 0 0 0 0 OKENST00000431891 0 0 ENSG00000182492 BGN 0 chrX:152760450-152772084 846 0 0 0 0 OKENST00000370204 0 0 ENSG00000182492 BGN 0 chrX:152767901-152775004 2253 0 0 0 0 OKENST00000492658 0 0 ENSG00000182492 BGN 0 chrX:152771326-152774263 420 0 0 0 0 OKENST00000451311 0 0 ENSG00000205542 TMSB4X 0 chrX:12993226-12995346 628 24435.2 1.10543e+06 1.09481e+06 1.11604e+06 OKENST00000380636 0 0 ENSG00000205542 TMSB4X 0 chrX:12993228-12995345 1702 338.059 15293.5 14619.9 15967.1 OKENST00000380635 0 0 ENSG00000205542 TMSB4X 0 chrX:12993362-12995346 767 16.3318 738.836 455.941 1021.73 OKENST00000380633 0 0 ENSG00000205542 TMSB4X 0 chrX:12993776-12995200 495 19445.9 879714 867473 891954 OK

Page 11: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Cuffdiff$SCRIPT cuffdiffknown

$SCRIPT parse

head $OUTDIR/cuffdiff_known/isoforms.fpkm.filtered.csv | tail -5

Differential transcript expression analysis

For differential gene expression, use deseq or edger, on raw read counts

test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significantENST00000380636 ENSG00000205542 TMSB4X chrX:12993226-12995346 q1 q2 OK 17154 51787.5 1.59406 -33.6146 0 0 yesENST00000434651 ENSG00000179344 HLA-DQB1 chr6:32627243-32636160 q1 q2 OK 40653.3 309.605 -7.0368 14.5901 0 0 yesENST00000451311 ENSG00000205542 TMSB4X chrX:12993226-12995346 q1 q2 OK 607642 1.14211e+06 0.910412 -61.6636 0 0 yesENST00000460185 ENSG00000179344 HLA-DQB1 chr6:32627243-32636160 q1 q2 OK 48842.6 15231.1 -1.68112 14.6929 0 0 yesENST00000464283 ENSG00000171793 CTPS1 chr1:41445006-41478235 q1 q2 OK 3728.67 13545.7 1.8611 -11.1109 0 0 yes

Page 12: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Cuffdiff (2)chr6:32627243-32636160

Page 13: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Cuffdiff (3)chrX:12993226-12995346

Page 14: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Cufflinks de novo$SCRIPT fpkmdenovo

Does not use an annotation reference to build transcripts

head -6 $OUTDIR/cufflinks_denovo/sample1/isoforms.fpkm_tracking

tracking_id gene_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_statusCUFF.1.1 CUFF.1 chr1:41445350-41478307 2927 37.6434 1692.15 1525.79 1858.51 OKCUFF.1.2 CUFF.1 chr1:41445350-41478307 2830 7.69887 346.081 255.794 436.367 OKCUFF.1.3 CUFF.1 chr1:41446049-41478307 2960 7.88763 354.566 264.404 444.728 OKCUFF.2.1 CUFF.2 chr6:32627320-32634500 1594 1541.9 70446.8 69114.3 71779.3 OKCUFF.5.1 CUFF.5 chr1:186283127-186310627 3425 102.555 4582.83 4304.65 4861.01 OKCUFF.5.2 CUFF.5 chr1:186283127-186344435 7342 212.951 9516.05 9287.23 9744.86 OK

Page 15: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Cuffmerge$SCRIPT merge

Merge every sample's transcript gtf by guiding to known reference

chr1 Cufflinks exon 11869 12227 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";chr1 Cufflinks exon 12613 12721 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";chr1 Cufflinks exon 13221 14409 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "3"; gene_name "DDX11L1"; oId "ENST00000456328"; nearest_ref "ENST00000456328"; class_code "="; tss_id "TSS1";

Priority Code Description1 = Complete match of intron chain2 c Contained3 j Potentially novel isoform (fragment): at least one splice junction is shared with a reference transcript4 e Single exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment.5 i A transfrag falling entirely within a reference intron6 o Generic exonic overlap with a reference transcript7 p Possible polymerase run-on fragment (within 2Kbases of a reference transcript)8 r Repeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% 9 u Unknown, intergenic transcript10 x Exonic overlap with reference on the opposite strand11 s An intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors)12 . (.tracking file only, indicates multiple classifications)

head -3 $OUTDIR/merge_denovo/merged.gtf

Page 16: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

De novo transcripts

Page 17: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

De novo transcripts (2)

Page 18: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

De novo transcript differential expression$SCRIPT cuffdiffdenovo

Runs on the previously merged gtf annotation

file (containing known and denovo transcripts)awk '$2=="j"' $OUTDIR/cuffdiff_denovo/isoforms.fpkm_tracking

tracking_id class_code nearest_ref_id gene_short_name locus length q1_FPKM q1_conf_lo q1_conf_hi q1_status q2_FPKM q2_conf_lo q2_conf_hi q2_statusTCONS_00002226 j ENST00000372621 CTPS1 chr1:41445006-41478307 3174 0 0 0 OK 0 0 0 OKTCONS_00002227 j ENST00000372621 CTPS1 chr1:41445006-41478307 3526 454.297 334.779 573.816 OK 1556.33 1240.52 1872.14 OKTCONS_00002230 j ENST00000372621 CTPS1 chr1:41445006-41478307 3230 0 0 0 OK 0 0 0 OKTCONS_00002239 j ENST00000372621 CTPS1 chr1:41445006-41478307 2162 817.073 630.149 1004 OK 3584.54 3038.53 4130.56 OKTCONS_00006887 j ENST00000271588 HMCN1 chr1:185703682-186160085 16079 4.62812 0 9.66312 OK 0 0 0 OKTCONS_00006891 j ENST00000271588 HMCN1 chr1:185703682-186160085 14027 5.981 0.207797 11.7542 OK 4.73078 0 12.3982 OKTCONS_00006907 j ENST00000287859 C1orf27 chr1:186265404-186390510 7615 281.486 234.108 328.863 OK 1301.49 1146.84 1456.13 OKTCONS_00015517 j ENST00000367478 TPR chr1:186265404-186390510 8242 0 0 0 OK 0 0 0 OKTCONS_00015518 j ENST00000367478 TPR chr1:186265404-186390510 9101 53.3195 26.6557 79.9832 OK 168.514 97.1523 239.875 OKTCONS_00015521 j ENST00000367478 TPR chr1:186265404-186390510 8679 445.843 369.391 522.296 OK 2939.97 2656.69 3223.25 OKTCONS_00015522 j ENST00000367478 TPR chr1:186265404-186390510 8955 399.175 327.324 471.027 OK 709.088 561.817 856.359 OKTCONS_00022239 j ENST00000399084 HLA-DQB1 chr6:32627144-32636160 1857 2321.15 1961.29 2681.01 OK 0 0 0 OKTCONS_00025265 j ENST00000380635 TMSB4X chrX:12993140-12995439 1222 3000.67 2587.39 3413.95 OK 9518.48 8375.59 10661.4 OK

Page 19: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

ChIP-seq

Cross link protein to DNA

Shear DNA

Select protein bound DNA with bead-attached antibody

Sequence DNA fragments

Map to genome

Narrow or wide peaks, use an “input” (control) or not

Page 20: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Narrow vs Wide peaks

Peter J Park, ChIP–seq: advantages and challenges of a maturing technology, Nature, Volume 10, October 2009

Page 21: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Setup and fastqs

head -8 $MAIN/chipseq/datasets/treatment_narrow.r1.fastq

@HWI-ST909:110:D0PNLACXX:8:1206:5197:66384CAGGGAGTGGCACTATTAGGAGGTGTGGCCTTTTTGGAGGATGTGTGTCA+?<@DFFFDFBDFBBHBEBHIIGB<CFCHD@HEDBE<DDBGBBB?BBC8BB@HWI-ST909:110:D0PNLACXX:8:1306:14281:115130TGACACACATCCTCCAACAAGGCCACACCTCCTAATAGTGCCACTCCCTG+CCCFFFFFHGHHGJJJGIGIJJIIIJJJIJJJJJGGCGDGIJIJJJJJJJ

export SCRIPT=/home/uwy-034-aa_group/gq_workshop/Chipseq.sh

$SCRIPT setup

Page 22: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Alignment$SCRIPT align

$SCRIPT alignsamse

Page 23: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Alignment (2)

Page 24: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Peak calling$SCRIPT sort

$SCRIPT peakcall

head $OUTDIR/peak_calling/with_control_peaks.bed

tail -3 $OUTDIR/peak_calling/with_control_peaks.xls

Multiple peak callers

Narrow or broad peaks

chr1 120355164 120356440 MACS_peak_1 552.74chr1 120377906 120381002 MACS_peak_2 305.09

chr start end length summit tags -10*log10(pvalue) fold_enrichment FDR(%)chr1 120355165 120356440 1276 682 289 552.74 6.63 0chr1 120377907 120381002 3096 1297 459 305.09 2.58 0

Page 25: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Peak calling (2)

Page 26: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Peak calling (3)

Page 27: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Broad peaks

Histone modifications (Lysine methylation, acetylation, etc)

Describes different states (open/closed chromatin, actively transcribed genes, etc)

Common marks:

H3k27acH3k27me3H3k36me3H3k4me1H3k4me2H3k4me3H3k9acH3k9me3

Page 28: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Broad peaks (2)

Page 29: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Broad peaks (3)

Page 30: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Annotation

Annotate peak calls

Input is the .bed file of peak coordinates

Using HOMER chipseq software

PeakID Chr Start End Peak Score Annotation Distance to TSS Nearest PromoterID Nearest Ensembl Gene NameMACS_peak_1312 chr16 23107011 23110399 3156.2 promoter-TSS (NR_030705) -321 NR_030705 Snord2MACS_peak_196 chr1 195096869 195099724 3153.23 TTS (NM_008059) 1086 NM_008059 ENSMUSG00000009633 G0s2MACS_peak_1320 chr16 30063694 30068848 3147.66 exon (NM_008235, exon 3 of 4) 828 NM_008235 ENSMUSG00000022528 Hes1MACS_peak_2867 chr6 129130319 129135291 3133.91 intron (NM_053109, intron 1 of 4) 2172 NM_053109 ENSMUSG00000030157 Clec2dMACS_peak_2777 chr6 72387042 72390706 3116.9 intron (NM_145569, intron 1 of 8) 678 NM_145569 ENSMUSG00000053907 Mat2aMACS_peak_885 chr13 23830123 23832753 3100 3' UTR (NM_015786, exon 1 of 1) 762 NM_015786 ENSMUSG00000036181 Hist1h1c

Page 31: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Peak statistics

Page 32: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Motif discovery

Page 33: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Other

Intersection of peaks / Venn diagrams

Differential peak analysis• Count based (edgeR, DESeq)• Specific software (Diffbind, ..)

Page 34: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

RRBS

Reduced region bisulfite sequencing~ 1/20 of CpG sites (~1 million vs ~20 million)

CpG rich regions

MspI enzyme digestion (CCGG)

Bisulfite conversion

Sequence

Page 35: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Protocol

Page 36: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Setup and fastqs

head -8 $MAIN/rrbseq/datasets/sample1.r1.fastq

@DD63XKN1:237:D1GRTACXX:2:2105:8608:109575/1TGGGGATTTGTTTTGTTAAATGTTATGGGTGTGGGGATAAGATATTGGGTTTAGAGAAAGTTTTGGGAAGAGTTTTGAGGTGGAAATAGTTTTGGTGTG+CCCFFFFFHHHHHJIIIIJJIIHJJJJJJHHIJIJJJJJFHIGGIJJIJHHIJIEGHHGHIIIHHEFBFDCECEE@B7?A;A??A>>CD>@ACB@ABBB@DD63XKN1:237:D1GRTACXX:2:2106:5775:197622/1TGGGGATTTGTTTTGTTAAATGTTATGGGTGTGGGGATAAGATATTGGGTTTAGAGAAAGTTTTGGGAAGAGTTTTGAGGTGGAAATAGTTTTGGTGT+CCCFFFFFHHHHHJJJHHIJIJHGHIJIIHIHIIJJGGIIIIIIIJIIJFGGIIBH=EHHHIIHGHFBDB@CACE@B?BD>BBDDC>CC;@AC@D<BA

head -8 $MAIN/rrbseq/datasets/sample1.r2.fastq

@DD63XKN1:237:D1GRTACXX:2:2105:8608:109575/2CAACAAAAAAACTAAAAAAACCTAAAAAATAAAAAACACCAAAAAACTATCAAAAAACCCCCACACACCAAAACTATTTCCACCTCAAAACTCTTCCCAA+BCCFFFFFHHHHGJJJIJIGJJIJJIJJIJJJJJJJJJHHHHFFFDDDDDCDDDDDDDDDDDBDDDDD?BDBABDCCDDD:CCBD<CCDDBDDDDDDCCA@DD63XKN1:237:D1GRTACXX:2:2106:5775:197622/2CAACAAAAAAACTAAAAAAACCTAAAAAATAAAAAACACCAAAAAACTATCAAAAAACCCCCACACACCAAAACTATTTCCACCTCAAAACTCTTCCCAA+CCCFFFFFHHGHHIJIJJJJJJJJJJJJJIIHIJJJIJHHHHFFFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEE@@DDBB<CD4A?CDCDCDDCC

export SCRIPT=/home/uwy-034-aa_group/gq_workshop/Rrbseq.sh

$SCRIPT setup

Page 37: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Alignment$SCRIPT align

Page 38: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Alignment (2)

Page 39: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Methylation calls$SCRIPT methylation

head $OUTDIR /methylation/sample1/sample1.map

chr pos context ratio total_C methy_C CI_lower CI_upperchr8 11540442 ACCGG 0 3 0 0 0.562chr8 11540448 GACCT 0 3 0 0 0.562chr8 11540449 ACCTG 0 3 0 0 0.562chr8 11540452 TGCTT 0 3 0 0 0.562chr8 11540464 TGCTA 0 3 0 0 0.562chr8 11540479 GACAA 0 3 0 0 0.562chr8 11540486 TACTG 0 3 0 0 0.562

Page 40: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Differential methylation region$SCRIPT bedgraph

$SCRIPT format

Page 41: Next Gen Sequencing Workshop - · PDF fileNext Gen Sequencing Workshop Genome Quebec and Mcgill Innovation Center Maxime Caron and colleagues max.caron@mail.mcgill.ca bioinformatics.genome@genome.mcgill.ca

Differential methylation region (2)$SCRIPT dmr

head $OUTDIR/methylation/dmr_results.csv

id chr start end strand pvalue qvalue meth.diffchr8.11550374 chr8 11550374 11550374 0 0.0011218041 8.39004886057458e-18 23.8095238095chr8.11565909 chr8 11565909 11565909 0 0.000153455 2.06586098042011e-18 23.6363636364chr8.11565918 chr8 11565918 11565918 0 0.000153455 2.06586098042011e-18 23.6363636364chr8.11565926 chr8 11565926 11565926 0 0.0005719758 4.81257640201146e-18 20.9090909091chr8.11565936 chr8 11565936 11565936 0 0.0005719758 4.81257640201146e-18 20.9090909091chr8.11566011 chr8 11566011 11566011 0 0.0005719758 4.81257640201146e-18 20.9090909091chr8.11566210 chr8 11566210 11566210 0 1.14057593442183e-05 2.55913332556091e-19 31.5789473684chr8.11566226 chr8 11566226 11566226 0 1.1075566696939e-08 3.72757099876611e-22 47.3684210526chr8.11566229 chr8 11566229 11566229 0 1.1075566696939e-08 3.72757099876611e-22 47.3684210526

Differential analysis per site or per tile

Hyper or hypo methylated