idprism: rapid analysis of forensic dna samples using mps...
TRANSCRIPT
UNCLASSIFIED
UNCLASSIFIED
IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS SNP Profiles
Darrell O. Ricke, PhD,James Watkins, Philip Fremont-Smith, & Adam
Michaleas2019 IEEE HPEC
Sept. 25, 2019
DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.
IdPrism - 2D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Disclaimer
DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.
This material is based upon work supported under Air Force Contract No. FA8702-15-D-0001 by the Defense Threat Reduction Agency (DTRA). Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the U.S. Air Force or Defense Threat Reduction Agency.
© 2019 Massachusetts Institute of Technology.
Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.
The authors have no competing financial interest to disclose.
IdPrism - 3D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Overview
Introduction to DNA Forensics
Computational Bottlenecks• SNP allele calling• Identification Searching & Mixture Analysis• Statistics – Probability of Random Man Not Excluded
IdPrism - 4D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Core DNA Forensics ConceptsCells contain 23 pairs of chromosomes
Locus: Specific position on a chromosome
Nucleotide (or base): One of 4 residues that make up the DNA polymer
A, C , G or T
Locus
XX (female)
XY (male)
IdPrism - 5D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Short Tandem Repeat (STR):Variable number of repeat units
Allele 1
Allele 2
… C C T A A A T G A A T G A A T G A A T G A A T G A A T G C G A G …
… C C T A A A T G A A T G A A T G A A T G C G A G …
AATG repeating sequence
Allele: One of many DNA sequences that may occupy a locus
…CGAGCCTAACGAGCCTA…
…CGAGCCTAGCGAGCCTA…
Single Nucleotide Polymorphism (SNP):Difference in one nucleotide
Allele 3
Allele 4
A/G Single nucleotide difference
Core DNA Forensics Concepts
IdPrism - 6D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Requirements STR Sizing SNP Sequencing
Human ID
Multi-contributor samples
Extract unknown profiles
Extended kinship
Touch samples
Biogeographic ancestry
Appearance
SNP Sequencing To Meet Requirements
IdPrism - 7D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Converting Biological Signatures to Digital ‘Barcodes’
1 personPopulation
Determine Presence of Low Frequency SNPs Across Genome
95%
5%
Selection of rare SNPs creates unique minor allele signatures/barcodes for individuals & enables effective differentiation of multiple barcodes in a mixture
Presence of minor allele indicated by line (1 bit) at SNP Position
Visualize asIndividual SNP ‘barcode’
A
G
IdPrism - 8D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
SNP Allele Calling and Sequencing Dynamic Range
• Thousands of SNP loci sequenced in parallel
• Thousands of sequence reads per loci
• 100 million sequence reads per experiment
• Output: for each SNP, the total counts of major
and minor allele reads in the sample
Read #
23.....
1000
Major allele ‘C’
Minor allele ‘T’
Massively Parallel DNA Sequencing
Minor allele fraction (mAF) =
Thousands of sequence reads at each SNP loci enables sensitive detection of minor contributors
CCC
CCCCCCC
TTTT
# minor allele reads# major allele reads
1
=0.004mAF =4
1000
Ex. SNP loci rs1000337
IdPrism - 9D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Overview
Introduction to DNA Forensics
Computational Bottlenecks• SNP allele calling• Identification Searching & Mixture Analysis• Statistics – Probability of Random Man Not Excluded
IdPrism - 10D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Problem #1 - SNP Calling Overview
Problem Complexity: O(N x L x M)• 100 million sequences – O(N)• 200 base pair read lengths - O(L)• 2.5k to 15k target SNP loci – O(M)
Current Standard pipelines:• Align HTS sequences to human reference genome• SNP call aligned reads• One to many hours on larger Linux servers
SNPtag
tagbarcode 5’ lock 3’ lock
Example rs142 HTS reads (forward strand)New Approach: O(N x L)• Identify target loci by
sequence tags• Allow shared tags to
map to 2+ loci• Lock in target SNPs with
flanking sequences
IdPrism - 11D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
SNP Allele Calling Runtime Comparisons
IdPrism - 12D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Statistical Power of Large SNP Panel Sequencing
P(False Match) is function of total # of mixture major allele SNPs (N)
Let p be minor allele frequency Let q be the major allele frequency (q = 1 – p)Let L be the number of allele mismatches- enables tolerance for incomplete profiles
SNP Frequency (pi)
-log 1
0P(
RM
NE)
8 Person Mixture
500 SNPs
1,000 SNPs
Increased statistical power with more SNPs
Probability of a minor allele at SNP i
Probability of n contributors NOT having a minor allele at SNP i
𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 0 𝑡𝑡𝑡𝑡 𝐿𝐿 = �0
𝐿𝐿
𝑞𝑞2𝑅𝑅 ∗𝑁𝑁!
𝐿𝐿! 𝑁𝑁 − 𝐿𝐿 !∗
1 − 𝑞𝑞2
𝑞𝑞2
𝐿𝐿
Combination(N,L)
𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅(𝐿𝐿 + 1)
IdPrism - 13D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Mixture Analysis Runtime Comparisons
FastID algorithm: mismatches = popcount((reference XOR mixture) AND reference)
IdPrism - 14D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Analysis Modules
Saturation
Kinship
Mixture Analysis1:1 ID
Trace
User Interface
Browse & Search Annotate
Mixtures InvestigationAllele Plots
Network
• Extract DNA• Screen & normalize
samples• Extract/copy
markers
Profile Generation
SNP Allele Caller
Sample Preparation
Proton
SequencingPlatform
Illumina
Other…
The IdPrism Platform architecture for DNA analysis addresses current capability gaps within an extensible and scalable framework
IdPrism System
Evidence
Lincoln Bioinformatics
5 minutes 5 minute
Best in class for • Performance• Speed• Scalability
IdPrism: Advanced DNA Forensics Platform
IdPrism - 15D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Phase I Results: Finding Known References in DNA Mixture
ReferenceEvidence(Mixture)
1 400200100 300
Mixture Analysis Approach
Isaacson et al. FSI Genetics (2015)
Lab Equimolar 6-Person MixtureA
B
CD
E
F
P(RMNE)<2.1e-54
Demonstrates MIT LL SNP approach can identify 6+ contributors in complex mixtures
Individual P(RMNE)A 2.8e-54B 2.1e-54C 2.1e-54D 2.1e-54E 2.1e-54F 2.7e-54
IdPrism - 16D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
Finding Known References in DNA Mixture
ReferenceEvidence(Mixture)
1 400200100 300
Mixture Analysis Approach
Isaacson et al. FSI Genetics (2015)
Lab 6-Person Mixture6/6 detected
Demonstrates MIT LL SNP approach can identify 10+ contributors in complex mixtures
A
B
CDE
F
G
IH
J
Lab 15-Person Mixture(12/15 detected)
Lab 10-Person Mixture10/10 detected
A
B
CD
E
F A
B
CDEF
G
I
H
JLK
P(RMNE)<2.1e-54P(RMNE) in 5e-10 to 5e-21 2.2e-8 to 4.6e-24
Saturated Mixtures
IdPrism - 17D. O. Ricke 09/25/19
UNCLASSIFIED
UNCLASSIFIED
• Voskoboinik & Darvasi “Forensic Identification of an Individual in Complex DNA Mixtures” Forensic Sci. Int. Genet. 5:428-435 (2011)
• Isaacson et al. “Robust detection of individual forensics profiles in DNA mixtures. Forensic Sci. Int. Genet. 14:31-7 (2015)
• Ricke et al. “GrigoraSNPs: Optimized HTS DNA forensic SNP Analysis” Journal of Forensic Sciences 63:1841-1845 (2018)
• Ricke “FastID: Extremely Fast Forensic DNA Comparisons” IEEE HPEC (2017)
• Ricke & Schwartz “Fast P(RMNE): Fast Forensic DNA Probability of Random Man Not Excluded Calculation” F1000Research (2017)
• Ricke et al. “Estimating Individual Contributions to Complex DNA SNP Mixtures. Journal of Forensic Sciences (2019)
• Patent application: DNA Mixtures from One or More Sources and Methods of Building Individual Profiles Therefrom
• US 62/534,590 PCT/US2018/041081, W0 2019/010410
References