tools for hts analysis
DESCRIPTION
Tools For HTS Analysis. Michael Brudno and Marc Fiume Department of Computer Science University of Toronto. Outline. Lab focus Our tools SHRiMP : read mapper VARiD : SNP and indel finder Savant : genome browser Discussion. Our Tools. READ MAPPING ( SHRiMP ). ASSEMBLY (UNNAMED). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/1.jpg)
TOOLS FOR HTS ANALYSIS
Michael Brudno and Marc FiumeDepartment of Computer ScienceUniversity of Toronto
![Page 2: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/2.jpg)
Outline•Lab focus•Our tools
•SHRiMP: read mapper•VARiD: SNP and indel finder•Savant: genome browser
• Discussion
![Page 3: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/3.jpg)
Our Tools
READ MAPPING (SHRiMP)
READ MAPPING (SHRiMP)
SNP DETECTION(VARiD)
SNP DETECTION(VARiD)
INDEL DETECTION(MODiL)
INDEL DETECTION(MODiL)
CNV DETECTION(CNVer)
CNV DETECTION(CNVer)
ASSEMBLY(UNNAMED)ASSEMBLY
(UNNAMED)VISUALIZATION(SAVANT)
VISUALIZATION(SAVANT)
![Page 4: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/4.jpg)
SHRIMP – SHORT READ MAPPING PACKAGE
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 5: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/5.jpg)
Key SHRiMP Features•High Sensitivity•Support for common formats (SAM, FASTQ, etc)•Flexible seeding framework•Multi-threading•Full support for SOLiD and Illumina (and 454) reads
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 6: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/6.jpg)
Sensitivity/Specificity Comparison
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 7: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/7.jpg)
Runtime comparison
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
Unpaired 50bp Reads Paired 75bp Reads
Mapping 6 million reads to C. Savignyi (180 Mb)
![Page 8: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/8.jpg)
VARID – SNP AND INDEL DETECTION
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 9: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/9.jpg)
motivation | methods | results | summary
Variation detection from NGS reads
Reference: TCAGCATCGGCATCGACTGCACAGGACCAGTCGATCGAC
Donor: ??????????????????????????????????????? GCATCGACTGCA CGGGATCGACTGAligned reads: ATCCATTGCA GATCCACTGCAC
• Determine differences (variation) between reference and donorusing NGS reads of the donor
![Page 10: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/10.jpg)
MotivationColor-space and Letter-space platforms
bring them together
MotivationColor-space and Letter-space platforms
bring them together
MethodsMethods
SummarySummary
ResultsResults
16
![Page 11: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/11.jpg)
motivation | methods | results | summary
Sequencing Platforms
• letter-space Sanger, 454, Illumina, etc
> NC_005109.2 | BRCA1 SX3TCAGCATCGGCATCGACTGCACAGG
• color-space AB SOLiD less software tools available
> NC_005109.2 | BRCA1 AF3T212313230313232121311120
• many differences -> useful to combine this information• sequencing biases• inherent errors • advantages
17
![Page 12: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/12.jpg)
A G
C T
2
1
2
1 3
0 0
00
A
A
C
0
G T
1 32
C 1
G
0
2
3 2
3 10
T 3 2 01
Color Space
motivation | methods | results | summary
Translation Matrix Translation Automata
18
![Page 13: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/13.jpg)
Translating
> T212313230313232121311120> T
Sequencing Error vs SNP
Sequencing Error> T212313230313232121311120> T212313230310232121311120> TCAGCATCGGCAAGCTGACGTGTCC
SNP> TCAGCATCGGCATCGACTGCACAGG> TCAGCATCGGCAGCGACTGCACAGG> T212313230312332121311120
A G
C T
CAGCATCGGCATCGACTGCACAGG
Color Space
motivation | methods | results | summary
19
![Page 14: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/14.jpg)
Color Space
motivation | methods | results | summary
20
• clear distinction between a sequencing error and a SNP• can this help us in SNP detection? sounds like it!
single color change error, 2 colors changed (likely) SNP.
Easy snp call Well covered bases Difficult Casereference incolor-space
reads
position
![Page 15: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/15.jpg)
Detection• Heterozygous SNPs• Homozygous SNPs• Tri-allelic SNPs• small indels• account for various errors, quality values & misalignments
Motivation • variation caller to handle both letter-space & color-space reads
Motivation
motivation | methods | results | summary
21
VARiD• system to make inferences on the donor bases
• variation detection
![Page 16: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/16.jpg)
Methods
Simple HMM Modelstates, emissions, transitions, FB
Extended HMM Modelgaps, diploids, exceptions
Methods
Simple HMM Modelstates, emissions, transitions, FB
Extended HMM Modelgaps, diploids, exceptions
MotivationMotivation
SummarySummary
ResultsResults
22
![Page 17: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/17.jpg)
Statistical model for a system - states
Assume that system is a Markov process with state unobserved. Markov Process: next state depends only on current state
We can observe the state’s emission (output)each state has a probability distribution over outputs
Hidden Markov Model (HMM)
motivation | methods | results | summary
23
S1 S2 S3
e1
e2
e1
e2
e1
e2
![Page 18: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/18.jpg)
Hidden Markov Model (HMM)
motivation | methods | results | summary
24
Apply HMM to variation detection: • we don’t know the state (donor), but • we can observe some output determined by the state (reads)
![Page 19: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/19.jpg)
Hidden Markov Model (HMM)
motivation | methods | results | summary
25
. . . . . B6 B7 B8 B9 . . . . .
B6 B7 B7 B8 B8 B9
AA
AC
color 0
color 1
AA
AC
color 0
color 1
AA
AC
color 0
color 1
unknowndonor
![Page 20: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/20.jpg)
Why pairs of letters? Handle colors.• AA and TT gives the same colors. Can’t just model colors
The donor could be:• letters: AA color 0• letters: AC color 1 :• letters: TT color 016 combinations
A G
C T
2
1
2
1 3
0 0
00
A
A
C
0
G T
1 32
C 1
G
0
2
3 2
3 10
T 3 2 01
States
motivation | methods | results | summary
26
B6 B7
![Page 21: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/21.jpg)
States and Transitions
motivation | methods | results | summary
27
. . . B6 B7 B8 . . .
B6 B7 B7 B8
AA
CA
AT
TT
:
:
GA
CT
::
AA
TT
.
.
.
.
.
.
.
.
.
Poss
ible
Sta
tes
Transitions• only certain transitions allowed
• when allowed, p(Xt|Xt-1) = freq(Xt)
• each state depends only on the previous states (Markov Process)
States• 16 possible states• only look at second letter
![Page 22: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/22.jpg)
T01020100311223 T1030101311223 T20100311223
ATTGCGCAATGCG TTGGGCAATGCGA GCGCACTGCGAC
Unknown genome
Color reads
Letter reads
Emissions
motivation | methods | results | summary
28
..... B6B7B8B9 ..... B7 B8
color 0
color 1
AA
AC
color 0
AA
![Page 23: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/23.jpg)
AA
color 0
color 1
color 2
color 3
letters A
letters C
1 – 3ε
ε
ε
ε
emissionprobabilityp(em|AA)
letters T ξ
1- 3ξ
letters G ξ
ξ
Emission Probabilities
motivation | methods | results | summary
29
Same color emission distribution
TT
Different letter emission distribution
TT
![Page 24: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/24.jpg)
])31[(])31[( 2112 Ep
E.g. For state CC:
Combining emission probabilities• probability that this state emitted these reads.
motivation | methods | results | summary
Emission Probabilities
30
T01020100311223 T1030101311223 T20100311223
ATTGCGCAATGCG TTGGGCAATGCGA GCGCACTGCGAC
..... B6B7B8B9 .....
![Page 25: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/25.jpg)
Summary
• unknown state • donor pair at location
•transitions • transition probabilities
• emissions • reads at location• emission probabilities
motivation | methods | results | summary
Simple HMM
31
B6 B7
AA
AC
color 0
color 1
![Page 26: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/26.jpg)
• Have set-up a form of an HMM• run Forward-Backward algorithm • get probability distribution over states at some position
AA
CA
AT
TT
:
:
GA
CT
::
likely state
motivation | methods | results | summary
Forward-Backward Algorithm
32
• Variation Detection:compare most likely state with reference:
ref: GCTATCCAdon: ...AT...
![Page 27: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/27.jpg)
Methods
Simple HMM Modelstates, emissions, transitions, FB
Extended HMM Modelgaps, diploids, exceptions
Methods
Simple HMM Modelstates, emissions, transitions, FB
Extended HMM Modelgaps, diploids, exceptions
MotivationMotivation
SummarySummary
ResultsResults
33
![Page 28: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/28.jpg)
Simple HMM • only detects homozygous SNPs
Extended HMM:• short indels• heterozygous SNPs• complex error profiles & quality values
motivation | methods | results | summary
Extended HMM
34
![Page 29: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/29.jpg)
Expand states• Have states that include gaps
• emit: gap or color
A---
-G
AGTG
T-T-
• Have larger states, for diploids
• Transitions built in similar fashion as before• Same algorithm, but in all we have 1600 states with very sparse transitions
Expansion: Gaps and het. SNPs
motivation | methods | results | summary
35
![Page 30: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/30.jpg)
• Emission probabilities o Support quality values o Use variable error rates for emissions
• Translate through the first lettero first color is incorrecto letter-space signal
Donor: ACAGCATCGGCATCGACTGC 1123132303123321213read: >T2123132303123321213 > C123132303123321213
Expansion
motivation | methods | results | summary
36
• Post-process putative SNPso uncorrelated adjacent errors may support het SNPso check putative SNPs
![Page 31: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/31.jpg)
motivation | methods | results | summary
blue: varid steps
Summary
![Page 32: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/32.jpg)
ResultsResults
MotivationMotivation
MethodsMethods
SummarySummary
38
![Page 33: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/33.jpg)
Results
motivation | methods | results | summary
• Human dataset from Harismendy et al, 2009. (NA17156,17275,17460,17773)
Color-space dataset:• Compare random subsets:
• Corona (with AB mapper) • VARiD (with SHRiMP) • VARiD (with AB mapper)
Conclusions:• Using F-measure, the three pipelines perform very similarly. • High-coverage results is as good as can be achieved
39
![Page 34: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/34.jpg)
Results
motivation | methods | results | summary
• Human dataset from Harismendy et al, 2009. (NA17156,17275,17460,17773)
Letter-space dataset:• Compare random subsets :
• GigaBayes (with Mosaik) • VARiD (with SHRiMP) • VARiD (with Mosaik)
Conclusion:• Using F-measure the three pipelines perform very similarly.• High-coverage results is as good as can be achieved
40
![Page 35: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/35.jpg)
Results
VARiD: Combining Letter-space and Color-space Datato achieve increased accuracy in at-cost comparison
motivation | methods | results | summary
41
![Page 36: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/36.jpg)
SummarySummary
MotivationMotivation
MethodsMethods
ResultsResults
42
![Page 37: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/37.jpg)
Summary of VARiD• HMM modeling underlying donor• Treats color-space and letter-space together in the same framework• no translation – take advantage of each technology’s properties• accurately calls short SNPs, indels in both color- and letter-space
• improved results with hybrid data.
Summary
motivation | methods | results | summary
• Website: http://compbio.cs.utoronto.ca/varid (VARiD freely available)
• Contact: [email protected]
• Website: http://compbio.cs.utoronto.ca/varid (VARiD freely available)
• Contact: [email protected]
43
![Page 38: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/38.jpg)
SAVANT GENOME BROWSER
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 39: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/39.jpg)
Challenge in Genomic Data Analysis• genomic data is generated in high volumes• interpretation and analysis challenge• typical pipeline employs many separate tools for computation and visualization
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 40: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/40.jpg)
Tools for HTS data analysisTool Cost Computation Visualization
Read Alignment e.g. Bowtie, BWA
Free Y N
File Format Conversion e.g. Galaxy, SAMTools
Free Y N
Other Comand-line Toolse.g. Genetic Variation Discovery, Comparitive Genomics, etc.
Free Y N
UCSC Genome Browser Free N Y
Integrative Genomics Viewer Free N Y
GBrowse Free N Y
CLC Genomics Workbench $$$ Y Y
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
• substantial disconnect between the processes of computational analysis and visualization
![Page 41: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/41.jpg)
Tools for HTS data analysisTool Cost Computation Visualization
Read Alignment e.g. Bowtie, BWA
Free Y N
File Format Conversion e.g. Galaxy, SAMTools
Free Y N
Other Comand-line Toolse.g. Genetic Variation Discovery, Comparitive Genomics, etc.
Free Y N
UCSC Genome Browser Free N Y
Integrative Genomics Viewer Free N Y
GBrowse Free N Y
CLC Genomics Workbench $$$ Y Y
Savant Genome Browser Free Y Y
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
• substantial disconnect between the processes of computational analysis and visualization
![Page 42: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/42.jpg)
Savant Genome Browser• platform for integrated visual analysis of genomic data
• feature-rich genome browser•computationally extensible via plugin framework
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 43: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/43.jpg)
(Very) Short List of Features
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 44: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/44.jpg)
FEATURE DEMONSTRATION
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
INTERFACEHTS READ ALIGNMENTSEXAMPLE PLUGINS: SNP FINDER
![Page 45: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/45.jpg)
Power of visual analytics• task: find the correct parameter for command-line tool
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 46: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/46.jpg)
Plugin Framework• unlocks the potential for performing visual analytics
• mutually beneficial for both users and tool developers
for users: perform complex data analyses on-the-fly within a visual environment
for programmers: platform for simple development and deployment of various programs
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 47: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/47.jpg)
CONCLUSIONS
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 48: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/48.jpg)
Conclusions• Savant is a platform for integrated visualization and analysis of genomic data
•stand-alone genome browser•novel features: e.g. table view, visualization modes, data selection, etc.
•computationally extensible through plugin framework
• makes interpretation and analysis of genomic data easier and more efficient
Savant Genome Browser - http://compbio.cs.toronto.edu/savant/
![Page 49: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/49.jpg)
Acknowledgements
Recep Andrew Vlad MikeBrudno
Yue Marc
Vanessa OrionJoe Nilgun
Paul
Vera
Misko Yoni
![Page 50: Tools For HTS Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56814767550346895db4a5b8/html5/thumbnails/50.jpg)
Questions?
SHRiMPhttp://compbio.cs.toronto.edu/shrimp
VARiDhttp://compbio.cs.toronto.edu/varid
Savant Genome Browserhttp://compbio.cs.toronto.edu/savant