idprism: rapid analysis of forensic dna samples using mps...

17
UNCLASSIFIED UNCLASSIFIED IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James Watkins, Philip Fremont-Smith, & Adam Michaleas 2019 IEEE HPEC Sept. 25, 2019 DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

Upload: others

Post on 16-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

UNCLASSIFIED

UNCLASSIFIED

IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS SNP Profiles

Darrell O. Ricke, PhD,James Watkins, Philip Fremont-Smith, & Adam

Michaleas2019 IEEE HPEC

Sept. 25, 2019

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

Page 2: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 2D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Disclaimer

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

This material is based upon work supported under Air Force Contract No. FA8702-15-D-0001 by the Defense Threat Reduction Agency (DTRA). Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the U.S. Air Force or Defense Threat Reduction Agency.

© 2019 Massachusetts Institute of Technology.

Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

The authors have no competing financial interest to disclose.

Page 3: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 3D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Overview

Introduction to DNA Forensics

Computational Bottlenecks• SNP allele calling• Identification Searching & Mixture Analysis• Statistics – Probability of Random Man Not Excluded

Page 4: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 4D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Core DNA Forensics ConceptsCells contain 23 pairs of chromosomes

Locus: Specific position on a chromosome

Nucleotide (or base): One of 4 residues that make up the DNA polymer

A, C , G or T

Locus

XX (female)

XY (male)

Page 5: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 5D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Short Tandem Repeat (STR):Variable number of repeat units

Allele 1

Allele 2

… C C T A A A T G A A T G A A T G A A T G A A T G A A T G C G A G …

… C C T A A A T G A A T G A A T G A A T G C G A G …

AATG repeating sequence

Allele: One of many DNA sequences that may occupy a locus

…CGAGCCTAACGAGCCTA…

…CGAGCCTAGCGAGCCTA…

Single Nucleotide Polymorphism (SNP):Difference in one nucleotide

Allele 3

Allele 4

A/G Single nucleotide difference

Core DNA Forensics Concepts

Page 6: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 6D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Requirements STR Sizing SNP Sequencing

Human ID

Multi-contributor samples

Extract unknown profiles

Extended kinship

Touch samples

Biogeographic ancestry

Appearance

SNP Sequencing To Meet Requirements

Page 7: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 7D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Converting Biological Signatures to Digital ‘Barcodes’

1 personPopulation

Determine Presence of Low Frequency SNPs Across Genome

95%

5%

Selection of rare SNPs creates unique minor allele signatures/barcodes for individuals & enables effective differentiation of multiple barcodes in a mixture

Presence of minor allele indicated by line (1 bit) at SNP Position

Visualize asIndividual SNP ‘barcode’

A

G

Page 8: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 8D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

SNP Allele Calling and Sequencing Dynamic Range

• Thousands of SNP loci sequenced in parallel

• Thousands of sequence reads per loci

• 100 million sequence reads per experiment

• Output: for each SNP, the total counts of major

and minor allele reads in the sample

Read #

23.....

1000

Major allele ‘C’

Minor allele ‘T’

Massively Parallel DNA Sequencing

Minor allele fraction (mAF) =

Thousands of sequence reads at each SNP loci enables sensitive detection of minor contributors

CCC

CCCCCCC

TTTT

# minor allele reads# major allele reads

1

=0.004mAF =4

1000

Ex. SNP loci rs1000337

Page 9: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 9D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Overview

Introduction to DNA Forensics

Computational Bottlenecks• SNP allele calling• Identification Searching & Mixture Analysis• Statistics – Probability of Random Man Not Excluded

Page 10: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 10D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Problem #1 - SNP Calling Overview

Problem Complexity: O(N x L x M)• 100 million sequences – O(N)• 200 base pair read lengths - O(L)• 2.5k to 15k target SNP loci – O(M)

Current Standard pipelines:• Align HTS sequences to human reference genome• SNP call aligned reads• One to many hours on larger Linux servers

SNPtag

tagbarcode 5’ lock 3’ lock

Example rs142 HTS reads (forward strand)New Approach: O(N x L)• Identify target loci by

sequence tags• Allow shared tags to

map to 2+ loci• Lock in target SNPs with

flanking sequences

Page 11: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 11D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

SNP Allele Calling Runtime Comparisons

Page 12: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 12D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Statistical Power of Large SNP Panel Sequencing

P(False Match) is function of total # of mixture major allele SNPs (N)

Let p be minor allele frequency Let q be the major allele frequency (q = 1 – p)Let L be the number of allele mismatches- enables tolerance for incomplete profiles

SNP Frequency (pi)

-log 1

0P(

RM

NE)

8 Person Mixture

500 SNPs

1,000 SNPs

Increased statistical power with more SNPs

Probability of a minor allele at SNP i

Probability of n contributors NOT having a minor allele at SNP i

𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 0 𝑡𝑡𝑡𝑡 𝐿𝐿 = �0

𝐿𝐿

𝑞𝑞2𝑅𝑅 ∗𝑁𝑁!

𝐿𝐿! 𝑁𝑁 − 𝐿𝐿 !∗

1 − 𝑞𝑞2

𝑞𝑞2

𝐿𝐿

Combination(N,L)

𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅(𝐿𝐿 + 1)

Page 13: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 13D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Mixture Analysis Runtime Comparisons

FastID algorithm: mismatches = popcount((reference XOR mixture) AND reference)

Page 14: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 14D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Analysis Modules

Saturation

Kinship

Mixture Analysis1:1 ID

Trace

User Interface

Browse & Search Annotate

Mixtures InvestigationAllele Plots

Network

• Extract DNA• Screen & normalize

samples• Extract/copy

markers

Profile Generation

SNP Allele Caller

Sample Preparation

Proton

SequencingPlatform

Illumina

Other…

The IdPrism Platform architecture for DNA analysis addresses current capability gaps within an extensible and scalable framework

IdPrism System

Evidence

Lincoln Bioinformatics

5 minutes 5 minute

Best in class for • Performance• Speed• Scalability

IdPrism: Advanced DNA Forensics Platform

Page 15: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 15D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Phase I Results: Finding Known References in DNA Mixture

ReferenceEvidence(Mixture)

1 400200100 300

Mixture Analysis Approach

Isaacson et al. FSI Genetics (2015)

Lab Equimolar 6-Person MixtureA

B

CD

E

F

P(RMNE)<2.1e-54

Demonstrates MIT LL SNP approach can identify 6+ contributors in complex mixtures

Individual P(RMNE)A 2.8e-54B 2.1e-54C 2.1e-54D 2.1e-54E 2.1e-54F 2.7e-54

Page 16: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 16D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

Finding Known References in DNA Mixture

ReferenceEvidence(Mixture)

1 400200100 300

Mixture Analysis Approach

Isaacson et al. FSI Genetics (2015)

Lab 6-Person Mixture6/6 detected

Demonstrates MIT LL SNP approach can identify 10+ contributors in complex mixtures

A

B

CDE

F

G

IH

J

Lab 15-Person Mixture(12/15 detected)

Lab 10-Person Mixture10/10 detected

A

B

CD

E

F A

B

CDEF

G

I

H

JLK

P(RMNE)<2.1e-54P(RMNE) in 5e-10 to 5e-21 2.2e-8 to 4.6e-24

Saturated Mixtures

Page 17: IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS ...ieee-hpec.org/2019/2019Program/program_htm_files/... · DNA Samples Using MPS SNP Profiles Darrell O. Ricke, PhD, James

IdPrism - 17D. O. Ricke 09/25/19

UNCLASSIFIED

UNCLASSIFIED

• Voskoboinik & Darvasi “Forensic Identification of an Individual in Complex DNA Mixtures” Forensic Sci. Int. Genet. 5:428-435 (2011)

• Isaacson et al. “Robust detection of individual forensics profiles in DNA mixtures. Forensic Sci. Int. Genet. 14:31-7 (2015)

• Ricke et al. “GrigoraSNPs: Optimized HTS DNA forensic SNP Analysis” Journal of Forensic Sciences 63:1841-1845 (2018)

• Ricke “FastID: Extremely Fast Forensic DNA Comparisons” IEEE HPEC (2017)

• Ricke & Schwartz “Fast P(RMNE): Fast Forensic DNA Probability of Random Man Not Excluded Calculation” F1000Research (2017)

• Ricke et al. “Estimating Individual Contributions to Complex DNA SNP Mixtures. Journal of Forensic Sciences (2019)

• Patent application: DNA Mixtures from One or More Sources and Methods of Building Individual Profiles Therefrom

• US 62/534,590 PCT/US2018/041081, W0 2019/010410

References