statistical methods for next generation sequencingkhansen/lecintro1.pdf · source: metzker ml....
TRANSCRIPT
![Page 1: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/1.jpg)
Statistical Methods for Next Generation
Sequencinghttp://www.biostat.jhsph.edu/~khansen/enar2012.html
Zhijin WuBrown University
Kasper Hansen, Rafael A IrizarryJohns Hopkins University
![Page 2: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/2.jpg)
\
Outline
• Introduction to NGS
• SNP calling and genotyping
• RNA-sequencing
• Hands-on exercise
2
![Page 3: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/3.jpg)
Introduction to Next Generation
SequencingRafael A. Irizarry
http://rafalab.org
Many slides courtesy of:Héctor Corrada Bravo and Ben Langmead
![Page 4: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/4.jpg)
D. melanogaster, Science, 2000 H. sapiens, Nature, 2000 M. musculus, Nature, 2002and Science, 2000
Back then: millions of clones (thousand bps) in 9 months for billions of dollars
Today: billion of short reads (35-100 bps) in a week for thousands of dollars
Remember this?
4
Claim: Assemble a genome in weeks for less than $100,000
![Page 5: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/5.jpg)
Start with DNA (millions of copies)
5
![Page 6: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/6.jpg)
Break it
6
![Page 7: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/7.jpg)
Put in sequencer
7
![Page 8: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/8.jpg)
Sequence first 35-400 bps: call them “reads”
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGTGTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGGCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTTCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCGTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTGCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC
8
![Page 9: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/9.jpg)
Platforms
![Page 10: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/10.jpg)
Illumina/Solexa
• Eight lanes
• ~160M short reads (~50-70 bp) per lane
TECHNOLOGY SPOTLIGHT: ILLUMINA® SEQUENCING
INTRODUCTION
CLUSTER GENERATION
SEQUENCING-BY-SYNTHESIS
ANALYSIS PIPELINE
DATA COLLECTION, PROCESSING, AND ANALYSIS
Illumina Sequencing TechnologyThe Genome Analyzer generates several billion bases of high-quality sequence per run at less than 1% of the cost of capillary-based methods. An expansive scale of research unimaginable with other technology platforms is now possible.
FIGURE 1: ILLUMINA GENOME ANALYZER FLOW CELL
Several samples can be loaded onto the eight-lane flow cell for simul-taneous analysis on the Illumina Genome Analyzer.
10
![Page 11: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/11.jpg)
Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010
Source: Whiteford et al. Swift: primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics. 2009
Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010
namesequencequality scores
x 100s of millions
![Page 12: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/12.jpg)
Not just Assembly
• Resequencing
• SNP discovery and genotyping
• Variant discovery and quantification
• TF binding sites: ChIP-Seq
• Gene expression: RNA-Seq
• Measuring methylation
![Page 13: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/13.jpg)
\
Not just Assembly
13
![Page 14: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/14.jpg)
\
1000 Genomes Project
Genotyping
14
![Page 15: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/15.jpg)
\
Human Epigenome Project
15
Methylation
![Page 16: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/16.jpg)
What to do with all these sequences?
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGTGTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGGCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTTCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCGTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTGCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC
16
![Page 17: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/17.jpg)
Most apps: Start by matching to reference
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGT GTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC CTTGCGTTTATGGTACGCTGGACTTTGTAGGATAC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC GCGTTTATGGTACGCTGGACTTTGTAGGATACCCT GAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGG GCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTT CGTTTATGGTACGCTGGACTTTGTAGGATACCCTC ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCG TCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTA GCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTT TCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTG CGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCT GTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTC
17
![Page 18: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/18.jpg)
Variant detection
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT
GTCGCAGTANCTGTCT||||||||| ||||||GTCGCAGTATCTGTCT
GGATCTGCGATATACC|||||| |||||||||GGATCT-CGATATACC
AATCTGATCTTATTTT||||||||||||||||AATCTGATCTTATTTT
ATATATATATATATAT||||||||||||||||ATATATATATATATAT
TCTCTCCCANNAGAGC||||||||| |||||TCTCTCCCAGGAGAGC
Align Aggregate
Reference
Call: HET A, Gp-value: 0.0023
GTCGCAGTATCTGTCT GTCGCAGTATCTGTNN TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTT TATATCGCAGTATCTG NATATCGCAGTATNTG CCCTATATCGCAGTAT ACACCCTATGTCGCA ACACCCTATCTCGCA ACACCCTATGTCGCA GA-CACCCTATGTCGC CCGGA-CACCCTATAT CCGGA-CACCCTATATGCCGGA-CACCCTATG
Statistics
“Coverage”
“Pileup” or “Coverage plot”
“Depth of coverage” = 14
![Page 19: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/19.jpg)
RNA-seq differential expression
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT
GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT CCGGAGCACCCTATATGCCGGAGCACCCTATG
GTCGCAGTANCTGTCT||||||||| ||||||GTCGCAGTATCTGTCT
GGATCTGCGATATACC|||||| |||||||||GGATCT-CGATATACC
AATCTGATCTTATTTT||||||||||||||||AATCTGATCTTATTTT
ATATATATATATATAT||||||||||||||||ATATATATATATATAT
TCTCTCCCANNAGAGC||||||||| |||||TCTCTCCCAGGAGAGC
Align Aggregate
Statistics
Gene 1differentially expressed?: YES
p-value: 0.0012
TGTCGCAGTATCTGTC AGCACCCTATGTCGCAGCCGGAGCACCCTATGGTCGCAGTANCTGTCT
||||||||| ||||||GTCGCAGTATCTGTCT
GGATCTGCGATATACC|||||| |||||||||GGATCT-CGATATACC
AATCTGATCTTATTTT||||||||||||||||AATCTGATCTTATTTT
ATATATATATATATAT||||||||||||||||ATATATATATATATAT
TCTCTCCCANNAGAGC||||||||| |||||TCTCTCCCAGGAGAGC
Align Aggregate
Gene 1
Sample A
Sample B
![Page 20: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/20.jpg)
GATTCCTGCCTCATCC
ChIP-seq
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT
GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT CCCTATATCGCAGTAT CCCTATATCGCAGTAT CCCTATATCGCAGTAT CCCTATATCGCAGTAT CCCTATATCGCAGTAT CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT CCGGAGCACCCTATATGCCGGAGCACCCTATG
GTCGCAGTANCTGTCT||||||||| ||||||GTCGCAGTATCTGTCT
GGATCTGCGATATACC|||||| |||||||||GGATCT-CGATATACC
AATCTGATCTTATTTT||||||||||||||||AATCTGATCTTATTTT
ATATATATATATATAT||||||||||||||||ATATATATATATATAT
TCTCTCCCANNAGAGC||||||||| |||||TCTCTCCCAGGAGAGC
Align
Reference
Binding occurs herep-value: 0.0023
Aggregate
Statistics
TATGCACGCGATAGCAGATAGCATTGCGAGAC
![Page 21: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/21.jpg)
Matching Revisted
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGT GTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC CTTGCGTTTATGGTACGCTGGACTTTGTAGGATAC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC GCGTTTATGGTACGCTGGACTTTGTAGGATACCCT GAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGG GCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTT CGTTTATGGTACGCTGGACTTTGTAGGATACCCTC ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCG TCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTA GCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTT TCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTG CGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCT GTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTC
21
![Page 22: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/22.jpg)
Matching 10,000,000 32 bps reads
• BLAST takes more than 6 months
• BLAT takes 2 months
• MAQ takes 1 day and half
• Bowtie takes 17 minutes
22
![Page 23: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/23.jpg)
Matching
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGT GTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC CTTGCGTTTATGGTACGCTGGACTTTGTAGGATAC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC GCGTTTATGGTACGCTGGACTTTGTAGGATACCCT GAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGG GCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTT CGTTTATGGTACGCTGGACTTTGTAGGATACCCTC ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCG TCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTA GCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTT TCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTG CGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCT GTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTC
23
![Page 24: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/24.jpg)
Mapping
CTCAAACTCCTGACCTTTGGTGATCCACCCGCCTNGGCCTTC
Take a read:
And a reference sequence:>MT dna:chromosome chromosome:GRCh37:MT:1:16569:1GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACTTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAATGTCTGCACAGCCACTTTCCACACAGACATCATAACAAAAAATTTCCACCAAACCCCCCCTCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGCCAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTTGGCGGTATGCACTTTTAACAGTCACCCCCCAACTAACACATTATTTTCCCCTCCCACTCCCATACTACTAATCTCATCAATACAACCCCCGCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAACCAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCCTCAAAGCAATACACTGACCCGCTCAAACTCCTGGATTTTGGATCCACCCAGCGCCTTGGCCTAAACTAGCCTTTCTATTAGCTCTTAGTAAGATTACACATGCAAGCATCCCCGTTCCAGTGAGTTCACCCTCTAAATCACCACGATCAAAAGGAACAAGCATCAAGCACGCAGCAATGCAGCTCAAAACGCTTAGCCTAGCCACACCCCCACGGGAAACAGCAGTGATTAACCTTTAGCAATAAACGAAAGTTTAACTAAGCTATACTAACCCCAGGGTTGGTCAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAGTCAATAGAAGCCGGCGTAAAGAGTGTTTTAGATCACCCCCTCCCCAATAAAGCTAAAACTCACCTGAGTTGTAAAAAACTCCAGTTGACACAAAATAGACTACGAAAGTGGCTTTAACATATCTGAACACACAATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTCAACAGTTAAATCAACAAAACTGCTCGCCAGAACACTACGAGCCACAGCTTAAAACTCAAAGGACCTGGCGGTGCTTCATATCCCTCTAGAGGAGCCTGTTCTGTAATCGATAAACCCCGATCAACCTCACCACCTCTTGCTCAGCCTATATACCGCCATCTTCAGCAAACCCTGATGAAGGCTACAAAGTAAGCGCAAGTACCCACGTAAAGACGTTAGGTCAAGGTGTAGCCCATGAGGTGGCAAGAAATGGGCTACATTTTCTACCCCAGAAAACTACGATAGCCCTTATGAAACTTAAGGGTCGAAGGTGGATTTAGCAGTAAACTAAGAGTAGAGTGCTTAGTTGAACAGGGCCCTGAAGCGCGTACACACCGCCCGTCACCCTCCTCAAGTATACTTCAAAGGACATTTAACTAAAACCCCTACGCATTTATATAGAGGAGACAAGTCGTAACCTCAAACTCCTGCCTTTGGTGATCCACCCGCCTTGGCCTACCTGCATAATGAAGAAGCACCCAACTTACACTTAGGAGATTTCAACTTAACTTGACCGCTCTGAGCTAAACCTAGCCCCAAACCCACTCCACCTTACTACCAGACAACCTTAGCCAAACCATTTACCCAAATAAAGTATAGGCGATAGAAATTGAAACCTGGCGCAATAGATATAGTACCGCAAGGGAAAGATGAAAAATTATAACCAAGCATAATATAGCAAGGACTAACCCCTATACCTTCTGCATAATGAATTAACTAGAAATAACTTTGCAAGGAGAGCCAAAGCTAAGACCCCCGAAACCAGACGAGCTACCTAAGAACAGCTAAAAGAGCACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATAGGTAGAGGCGACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTTAAATTTGCCCACAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTCCAAAGAGGAACAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATTTAACACCCATAGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACACCCACTACCTAAAAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCAATCTATCACCCTATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGCATAAGCCTGCGTCAGATTAAAACACTGAACTGACAATTAACAGCCCAATATCTACAATCAACCAACAAGTCATTATTACCCTCACTGTCAACCCAACACAGGCATGCTCATAAGGAAAGGTTAAAAAAAGTAAAAGGAACTCGGCAAATCTTACCCCGCCTGTTTACCAAAAACATCACCTCTAGCATCACCAGTATTAGAGGCACCGCCTGCCCAGTGACACATGTTTAACGGCCGCGGTACCCTAACCGTGCAAAGGTAGCATAATCACTTGTTCCTTAAATAGGGACCTGTATGAATGGCTCCACGAGGGTTCAGCTGTCTCTTACTTTTAACCAGTGAAATTGACCTGCCCGTGAAGAGGCGGGCATAACACAGCAAGACGAGAAGACCCTATGGAGCTTTAATTTATTAATGCAAACAGTACCTAACAAACCCACAGGTCCTAAACTACCAAACCTGCATTAAAAATTTCGGTTGGGGCGACCTCGGAGCAGAACCCAACCTCCGAGCAGTACATGCTAAGACTTCACCAGTCAAAGCGAACTACTATACTCAATTGATCCAATAACTTGACCAACGGAACAAGTTACCCTAGGGATAACAGCGCAATCCTATTCTAGAGTCCATATCAACAATAGGGTTTACGACCTCGATGTTGGATCAGGACATCCCGATGGTGCAGCCGCTATTAAAGGTTCGTTTGTTCAACGATTAAAGTCCTACGTGATCTGAGTTCAGACCGGAGTAATCCAGGTCGGTTTCTATCTACNTTCAAATTCCTCCCTGTACGAAAGGACAAGAGAAATAAGGCCTACTTCACAAAGCGCCTTCCCCCGTAAATGATATCATCTCAACTTAGTATTATACCCACACCCACCCAAGAACAGGGTTTGTTAAGATGGC
How do we determine the read’s point of origin with respect to the reference?
CTCAAAGACCTGACCTTTGGTGATCCACCC-----GCCTNGGCCTTC|||||| |||| |||| ||||||||| |||| |||||CTCAAACTCCTGGATTTTG--GATCCACCCAGCTGGCCTTGGCCTAA
Hypothesis 1:
Hypothesis 2:
CTCAAACTCCTGACCTTTGGTGATCCACCCGCCTNGGCCTTC|||||||||||| ||||||||||||||||||||| ||||| |CTCAAACTCCTG-CCTTTGGTGATCCACCCGCCTTGGCCTAC
Answer: sequence similarity
Read
Reference
Read
Reference
Say hypothesis 2 is correct. Why are there still mismatches and gaps?
Which hypothesis is better?
![Page 25: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/25.jpg)
More on variants and base-calling
![Page 26: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/26.jpg)
SNPs
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGT GTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC CTTGCGTTTATGGTACGCTGGACTTTGTAGGATAC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC GCGTTTATGGTACGCTGGACTTTGTAGGATACCCT GAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGG GCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTT CGTTTATGGTACGCTGGACTTTGTAGGATACCCTC ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCG TCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTA GCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTT TCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTG CGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCT GTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTC
![Page 27: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/27.jpg)
SNPs
TCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTG GTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGT TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTA GCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC CGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCT GCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTT GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGT GTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT GAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGG CTTGCGTTTATGGTACGCTGGACTTTGTAGGATAC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC GCGTTTATGGTACGCTGGACTTTGTAGGATACCCT CGTTTATGGTACGCTGGACTTTGTAGGATACCCTC GTTTATGGTACGCTGGACTTTGTAGGATACCCTCG TATGGTACGCTGGACTTTGTAGGATACCCTCGCTT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT CTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTC
![Page 28: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/28.jpg)
All Reads
Sequencing cycle
Nuc
leot
ide
com
posi
tion
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30
A
T
![Page 29: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/29.jpg)
1000 Genomes Data
050
0010
000
1500
0 Sample NA19238, UCSC Loci, n=42691
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
010
0015
0020
00
Sample NA19238, Non−UCSC Loci, n=4986
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
0050
0060
0070
00
Sample NA19238, Hapmap Loci, n=19067
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000 Sample NA19238, Non−Hapmap Loci, n=28610
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
1200
0
Sample NA19238, 1kgenomes Loci, n=36333
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
00
Sample NA19238, Non−1kgenomes Loci, n=11344
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
0010
000
1500
0
Sample NA19238, UCSC Loci, n=48864
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
0050
0060
0070
00
Sample NA19238, Non−UCSC Loci, n=15005
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
00
Sample NA19238, Hapmap Loci, n=20069
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
0010
000
1500
0
Sample NA19238, Non−Hapmap Loci, n=43800
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
1200
0
Sample NA19238, 1kgenomes Loci, n=38306
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
Sample NA19238, Non−1kgenomes Loci, n=25563
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
All data
Filtered: snpq>=20,
nreads<=360
SNPs in dbSNP Novel SNPs
Cycle
Here we aggregate reads and
record cycle at which variant appears
![Page 30: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/30.jpg)
1000 Genomes Data
050
0010
000
1500
0 Sample NA19238, UCSC Loci, n=42691
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
010
0015
0020
00
Sample NA19238, Non−UCSC Loci, n=4986
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
0050
0060
0070
00
Sample NA19238, Hapmap Loci, n=19067
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000 Sample NA19238, Non−Hapmap Loci, n=28610
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
1200
0
Sample NA19238, 1kgenomes Loci, n=36333
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
00
Sample NA19238, Non−1kgenomes Loci, n=11344
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
0010
000
1500
0
Sample NA19238, UCSC Loci, n=48864
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
0050
0060
0070
00
Sample NA19238, Non−UCSC Loci, n=15005
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
00
Sample NA19238, Hapmap Loci, n=20069
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
0010
000
1500
0
Sample NA19238, Non−Hapmap Loci, n=43800
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
1200
0
Sample NA19238, 1kgenomes Loci, n=38306
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
Sample NA19238, Non−1kgenomes Loci, n=25563
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
All data
Filtered: snpq>=20,
nreads<=360
SNPs in dbSNP Novel SNPs
Cycle
![Page 31: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/31.jpg)
What is causing this?
![Page 32: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/32.jpg)
Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010
Source: Whiteford et al. Swift: primary data analysis for the Illumina Solexa sequencingplatform. Bioinformatics. 2009
Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010
namesequencequality scores
x 100s of millions
(slide courtesy of Ben Langmead)
![Page 33: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/33.jpg)
Before Reads There were Intensities
![Page 34: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/34.jpg)
We Want to See This
Color coded by call made: A, C, G, T
![Page 35: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/35.jpg)
But See This
Color coded by call made: A, C, G, T
Four channel fluorescence intensity, cycle 1
A
C
G
T
![Page 36: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/36.jpg)
Gets Worse for higher cycles
Color coded by call made: A, C, G, T
![Page 37: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/37.jpg)
Error Rate and Reported Quality
0 20 40 60
0.0
0.1
0.2
0.3
0.4
Sequencing Cycle
Estim
ated
Erro
r Pro
babi
litie
sA>CT>CA>GT>AC>TC>AG>AT>GG>CG>TA>TC>G
0 20 40 60
0.00
00.
005
0.01
00.
015
Sequencing Cycle
Mis
mat
ch P
ropo
rtion
s
A>CT>CA>GT>AC>TC>AG>AT>GG>CG>TA>TC>G
![Page 38: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/38.jpg)
Remember This?
Sequencing cycle
Nuc
leot
ide
com
posi
tion
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30
A
T
![Page 39: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/39.jpg)
Bias Explained
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
● ●
● ●
●●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
● ●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 2000 4000 6000 8000
05000
10000
0.5(A+T)
A−T
●
●
cycle << 20cycle ≥≥ 20
![Page 40: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/40.jpg)
Base Calling
1) Rougemont et al. Probabilistic base calling of Solexa sequencingdata. BMC Bioinformatics (2008)
2) Erlich et al. Alta-Cyclic: a self-optimizing base caller fornext-generation sequencing. Nat Methods (2008) 3) Kao et al. BayesCall: A model-based base-calling algorithm forhigh-throughput short-read sequencing. Genome Res (2009)
4) Corrada Bravo and Irizarry. Model-Based Quality Assessment and Base-Callingfor Second-Generation Sequencing Data. Biometrics (2009)
5) Cokus et al. Shotgun bisulphite sequencing of the Arabidopsisgenome reveals DNA methylation patterning. Nature (2009)
![Page 41: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/41.jpg)
Intensity Model
cycle
12.5
13.0
13.5
14.0
![Page 42: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/42.jpg)
Intensity Model
log intensity read i, cycle j, channel c
indicators of nucleotide identity, read i, pos. j
∆ijc =
�1 if c is the nucleotide in read i position j
0 otherwise
uijc = ∆ijc(µcjα + xTj αi + �α
ijc) +
(1−∆ijc)(µcjβ + xTj βi + �β
ijc)
![Page 43: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/43.jpg)
Intensity Model
log intensity read i, cycle j, channel c
read-specific linear models
uijc = ∆ijc(µcjα + xTj αi + �α
ijc) +
(1−∆ijc)(µcjβ + xTj βi + �β
ijc)
![Page 44: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/44.jpg)
Intensity Model
log intensity read i, cycle j, channel c
measurement error
uijc = ∆ijc(µcjα + xTj αi + �α
ijc) +
(1−∆ijc)(µcjβ + xTj βi + �β
ijc)
�αijc ∼ N(0, σ2
αi) �βijc ∼ N(0, σ2
βi)
![Page 45: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/45.jpg)
Read & Cycle Effects
cycle
h(intensity)
11121314
0 10 20 30 0 10 20 30 0 10 20 30
11121314
11121314
0 10 20 30 0 10 20 30
11121314
![Page 46: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/46.jpg)
Base IdentityProbability Profiles
1 3 5 7 9 11 14 17 20 23 26 29 32 35
Position
00.20.40.60.81
Probability
![Page 47: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/47.jpg)
Before And After
Sequencing cycle
Nuc
leot
ide
com
posi
tion
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30
A
T
Sequencing cycle
Nuc
leot
ide
com
posi
tion
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30
A
T
Solexa (Default) Srfim (Statistical Approach)
![Page 48: Statistical Methods for Next Generation Sequencingkhansen/LecIntro1.pdf · Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Source: Whiteford](https://reader034.vdocuments.us/reader034/viewer/2022050412/5f88a58f47678a62555bc458/html5/thumbnails/48.jpg)
The End