What is necessary (and unnecessary) for analyses of offender
databases
Forensic Bioinformatics (www.bioforensics.com)[email protected]
Jason R. GilderAugust 16, 2008
Offender databases
• Originally designed for convicted offenders– CODIS: Convicted Offender DNA
Index System
• Expanded – Unsolved crime samples– Arrestees– Elimination profiles
CODIS
• COmbined DNA Index System– National: NDIS– State: SDIS - fewer restrictions– Local: LDIS - fewest restrictions
• Convicted Offender Profiles in NDIS: 6,031,000
• Forensic Profiles in NDIS: 225,400
• More than 71,800 cold hits
Why analyze a database?
• Questions remain regarding the weight of a DNA database match– Random Match Probability (RMP)– Database Match Probability (DMP)– Balding & Donnelly LR– Other
• Composition of database may affect chance of a coincidental match– Presence of relatives
Structure of a DNA database
• Collection of records
• Structured Query Language (SQL) format
ID# Fname Lname Pop SSN Date D3 vWA FGA … D7
AC937 John Doe CAU 283-24-4300
5/2/02 13, 15 16, 16 21, 23 11, 14
BQ384 Jane Doe HIS 365-78-3472
7/23/03 12, 17 15, 19 25, 25 10, 10
BZ927 Frank Smith AA 312-55-1476
2/9/06 13, 15 14, 15 24, 26 12, 16
Examples of possible issues with the use of DNA databases
• Michigan v. Gary Leiterman– Evidence: blood found on victim’s hand– Cold hit to a 4-year-old boy
• R v. Sean Hoey– Evidence: explosive device– Cold hit to a 14-year-old boy
• Jaidyn Leskie inquest (Australia)– Evidence: clothing from deceased– Cold hit to a rape victim
Lab error and false cold hits
How a database can be analyzed
• Perform all pairwise profile comparisons– the “Arizona Search”
• P1 with P2, P1 with P3, P1 with P4, …, P1 with Pn
• P2 with P3, P2 with P4, P2 with P5, …, P2 with Pn
• Analyze profile similarity– Count number of matching loci and alleles– Perform kinship analyses
Arizona Match Data
• 65,493 Profiles– 122 pairs matched at 9 of 13 loci– 20 pairs matched at 10 of 13– 1 pair matched at 11 of 13– 1 pair matched at 12 of 13
Loci Ave Std Dev p-value9 103.47 10.64 0.08
10 3.06 1.68 9.6E-2311 0.05 0.23 4.4E-0512 0 0
9+ 106.59 10.83 5.8E-04
Review of Victoria State Database
Krane/Paoletti analysis: >11,000 profiles each compared to all others across 9 loci:
Shared alleles Observed occurrences 14 401
15 2716 117 1618 0
Aussie Bump
# Matching
Alleles
14 15 16 17
# Observed 401 27 1 16
300
100
20
1
Issues with the release or analysis of a DNA database
• Privacy concerns– Names, social security numbers, DNA
profiles, addresses, etc.
• Issues with analysis– Duplicate profiles, multiple databases,
presence of relatives, processing time, CODIS requirements
• Legal issues– California Proposition 69
Issue 1: Privacy concerns
• Database contains private information that should not be released
• Answer: provide anonymous profiles only
• Accomplished through one command
• SELECT D3, vWA, FGA, …, D7 FROM CODIS_DB
Issue 2: Duplicate profiles
• Many databases contain at least 10-15% duplicate profiles
• Answer: ignore duplicates in analysis
• A fairly thorough database analysis can take place with duplicates removed– Also identify potential mistyping rate
• The lab may be able to cull out duplicates from the same individual with additional information (e.g. SSN)
Issue 2b: Multiple databases
• California DOJ contains information in two databases that can be cross referenced to remove duplicates– Login DB – contains unique “CII” ID and accession
numbers of all samples for that individual– SDIS – contains accession number and profile
• Answer: JOIN the data with one command– Only select the first accession number profile
• SELECT D3, vWA, FGA, … D7 FROM SDIS JOIN LOGIN_DB WHERE (LOGIN_DB.ACCESSION1 = SDIS.ACCESSION)
Issue 3: Presence of relatives
• It is difficult to identify the presence of relatives by hand by simply looking at the CODIS records
• “There are a significant, but unknown number, of such related individuals in California’s offender database.” – Kenneth Konzack
• Answer: Exactly!
Issue 4: Processing time
• Performing an internal search of the database will take too long (a week or more) and will not allow for CODIS searches during that time
• Answer: perform an analysis on a separate computer or computers
• Pairwise database search is “embarrassingly parallel”
Issue 5: Legal issues
• Legal statutes (e.g., California Proposition 69) prohibit release of database to citizens
• Answer: 38 state statutes (including CA) allow for an outside review of their database for statistical analysis – Many require the removal of identifying
information
Questions?