user group meeting 18th september 2018 - wordpress.com · 0 500 1000 1500 2000 2500 3000 3500 4000...
TRANSCRIPT
John Buckleton
1st North Eastern STRmix
User Group Meeting
18th September 2018
Connecticut
https://johnbuckleton.wordpress.com/
2
• V2.6 Fly past
• Generic parameters - Hillary
• Recent court experiences
• Nathan Adams Code documentation
• Alan Jamieson
• Lund & Iyer
• NIST Foundational review
• PCAST
• Spanish and Portuguese Interlab
4
Generalised stutter
Heterozygote at SE33
• 21,26
Back stutter (-1,0 repeat unit [v2.3] )
• 20,21,25,26
Forward stutter (+1,0 rpt unit [v2.4] )
• 20,21,22,25,26,27
SE33 for v2.5
• 20,21,22,25,26,27
v2.6 Double back stutter (-2,0 rpt units)
• 19,20,21,22,24,25,26,27
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
17 18 19 20 21 22 23 24 25 26 27 28 29 30
rfu
v2.6 2bp back stutter (-1,2)
• 19,20,20.2,21,22,24,25,25.2,26,27
Missing data problem
Double back
12
0
10
20
30
40
50
60
0 2000 4000 6000 8000
Do
ub
le b
ack
stu
tter
h
eigh
t (r
fu)
Parent peak height (rfu)
Read low (10rfu)
Good even
highish template
We need to model
the “missing data
Variable Number of Contributors
13
VarNOC
• Deconvolutions can now be set up with a
range of NOC (Developmentally validated
for N and N+1 contributors only.)
• Independent deconvolutions are carried
out under each scenario. (NOC is kept
constant under Hp and Hd for each
deconvolution.)
14
2 or 3 contributors
Prosecution: “I think it’s 3 people”
Defence: “I think it’s 2 people”
Initial thoughts may be that this is biased in some way.
It is actually perfectly allowable
As long as 2p and 3p scenarios can be properly weighed
against each other
That is in essence what STRmix VarNOC does – weighs the
likelihood of N = 2 against N = 3
16
Generic Parameters - Hillary
17
Generic Parameters - Hillary
log(LR)s for true
contributors for
cognate vs non-
cognate
(PROVEDIt) data.
18
Some court challenges
19
Nathan Adams
• Inspected V1.08, V2.3.07, V2.4.05
• V2.4.06 pending
• Critical of code quality and documentation around
coding
• Does not accept you can validate by testing
• We are seeking accreditation
• Also considering “back documentation”
Internal validation compilation
21
LR for Hp Support
and 1/LR for Hd
Support
Verbal Qualifier
Fraction of false donor LRs in
this range (N = 28,250,000)
[1-2) Uninformative 0.003197 1 in 312
[2-99) Limited Support 0.003143 1 in 318
[99-9999) Moderate Support 5.53 x 10-5 1 in 18,000
[9999-999,999) Strong Support 7.08 x 10-7 1 in 1,400,000
≥999,999 Very Strong Support 0
2,825 mixtures 28,250,000 false donors
Likelihood ratio
27
This illustrates that if the LRs of all the millions of
potential genotypes from a mixture were calculated
and then arranged in order of size, the suspect is
unlikely to be the highest LR.
Weights and ranks
• Q. My client is not the top genotype in the
list (i.e. the one with the highest weight)
28
Weights and ranks
Contributor TPOX Weight
C1 7,11 100.00%
C2
7,11 37.80%
11,11 25.73%
7,7 23.42%
7,Q 5.85%
11,Q 5.67%
Q,Q 1.53%
29
5
2
1
6
1
2
1
2
3
1
4
1
8
3
1 1
9
4 4
3
2
10 11 12 14
10
10
16
29
7
13 1
7 19
14
8
10 1
5
37
9
15
27
22
36 45
36 45
28
55
171
120
66
136
36
153
45
36
36 45
946
45
120 171
91
0.1
1
10
100
1000
D3S
1358
vW
A
D16S
539
CS
F1P
O
TP
OX
Yin
del
D8S
1179
D21S
11
D18S
51
DY
S391
D2S
441
D19S
433
TH
01
FG
A
D22S
1045
D5S
818
D13S
317
D7S
820
SE
33
D10S
1248
D1S
1656
D12S
391
D2S
1338
Ran
k
Rank C1
#C1
genotypes
(STRmix)
#genotypes
(population)
200pg 1:1:1:1 - comparison to C1
(totals: Rank 59,719,680, GTs
1.25E+24, pop-GTs 8.55E+38)
30
200pg 1:1:1:1 - comparison to C1
(totals: Rank 59,719,680, GTs
1.25E+24, pop-GTs 8.55E+38)
5
2
1
6
1
2
1
2
3
1
4
1
8
3
1 1
9
4 4
3
2
10 11 12 14
10
10
16
29
7
13 1
7 19
14
8
10 1
5
37
9
15
27
22
36 45
36 45
28
55
171
120
66
136
36
153
45
36
36 45
946
45
120 171
91
0.1
1
10
100
1000
D3S
1358
vW
A
D16S
539
CS
F1P
O
TP
OX
Yin
del
D8S
1179
D21S
11
D18S
51
DY
S391
D2S
441
D19S
433
TH
01
FG
A
D22S
1045
D5S
818
D13S
317
D7S
820
SE
33
D10S
1248
D1S
1656
D12S
391
D2S
1338
Ran
k
Rank C1
#C1
genotypes
(STRmix)
#genotypes
(population)
Rank of C1
Number of
genotypes STRmix
considered
Number of possible
genotypes at this locus
given #alleles
31
5
2
1
6
1
2
1
2
3
1
4
1
8
3
1 1
9
4 4
3
2
10 11 12 14
10
10
16
29
7
13 1
7 19
14
8
10
15
37
9
15
27
22
36 45
36 45
28
55
171
120
66
136
36
153
45
36
36 45
946
45
120 171
91
0.1
1
10
100
1000
D3S
1358
vW
A
D16S
539
CS
F1P
O
TP
OX
Yin
del
D8S
1179
D21S
11
D18S
51
DY
S391
D2S
441
D19S
433
TH
01
FG
A
D22S
1045
D5S
818
D13S
317
D7S
820
SE
33
D10S
1248
D1S
1656
D12S
391
D2S
1338
Ran
k
Rank C1
#C1
genotypes
(STRmix)
#genotypes
(population)
Note that the true donor is not always rank 1
He would only be rank 1 everywhere in a very clear
profile32
Most genotypes do not exist
Prof Bruce Weir
• In our example there are 8.55 x 1038
genotypes
• 7.5 x 109 worldwide population
• Only about 1 in 1029 genotypes exist
• There are about 6 x 107 genotypes above
our rank
• Hence potentially no actual people above
our rank
33
Lund and Iyer
34
• US v Gissantaner
• California v Littleton
35
36
37
38
Interlab Spanish and
Portuguese
39
US v Gissantaner
John Butler sends paper to Defense (not prosecution)
Introduced via testimony of Steven Lund (Lund & Iyer)
Emphasis on variability
40
LRmix
1.E+00
1.E+02
1.E+04
1.E+06
1.E+08
1.E+10
1.E+12
1.E+14
1.E+16
1.E+18
LR
Euroforgen-NoE Case 2 proposition set 1
GHEP-ISFGEuroforgen-NoE Case 1
41
42
STRmix™ collaborative exercise on DNA mixture interpretation
Jo-Anne Bright[1], Kevin Cheng[1], Zane Kerr[2], Catherine McGovern[1], Hannah
Kelly[1], Tamyra R. Moretti[3], Michael A. Smith[3], Frederick R. Bieber[4], Bruce
Budowle[5] , Michael D. Coble[5], Rashed Alghafri[6], Paul Stafford Allen[7], Amy
Barber[8], Vickie Beamer[9], Christina Buettner[10], Melanie Russell[11], Christian
Gehrig[12], Tacha Hicks[13], Jessica Charak[14], Kate Cheong-Wing[15], Anne
Ciecko[16], Christie T. Davis[18], Michael Donley[19], Natalie Pedersen[20], Bill
Gartside[21], Dominic Granger[22], MaryMargaret Greer-Ritzheimer[23], Erick
Reisinger[24], Jarrah Kennedy[25], Erin Grammer[26], Marla Kaplan[27], David
Hansen[28], Hans J. Larsen[29], Alanna Laureano[30], Christina Li[31], Eugene
Lien[32], Emilia Lindberg[33], Ciara Kelly[34], Ben Mallinder[35], Simon
Malsom[36], Alyse Yacovone-Margetts[37],Andrew McWhorter[38], Sapana M.
Prajapati[39], Tamar Powell[40], Gary Shutler[41], Kate Stevenson[1], April R.
Stonehouse[42], Lindsey Smith[43], Julie Murakami[44], Eric Halsing[45], Darren
Wright[46], Leigh Clark[47], Duncan A. Taylor[48,49], John Buckleton[1,50]
43
Sample 1 Experimental NoC = 4
44
Sample 2 experimental NoC = 3
Dropped a locus
Laboratory
artifact policy
45
Conclusion
We continue to make incremental scientific
developments
Usability continues to improve
Continuing (escalating) defense challenge
Budowle, Coble
Strong community
Fred Bieber Moot court
47
Q: Would you not agree that there is a possibility
of making a mistake in estimating the true number
of contributors?
A1: Yes
A2: The NoC is unknown and should be treated as
a distribution. STRmix™ V2.6 can test two NoC.
Uncertainty in NoC does not translate into
equivalent uncertainty in the LR
48
49
Q: How do you "guestimate" the number of
contributors?
A: In casework and even in mock samples the
number of contributors to a sample is never known
for certain.
A reasonable number is assigned usually by expert
judgement based on on the number of alleles and
peak heights.
In empirical trials using mock samples an error of
overestimation or underestimation tends to result
in a lowering of large LRs either by a small amount
or in some cases a few orders of magnitude.
50
Q: How do you "guestimate" the number of
contributors?
The net effect of uncertainty in the number of
contributors for larger LRs is that either a correct
NoC was used or the result is likely to be even
more conservative
51
Q: What is effect of choosing too few contributors?
A: One too few contributors tends to cause a
“false” exclusion of the smallest donor. This is the
one that the expert did not think was even there.
This is why the word “false” has airquotes.
52
Q: What is effect of choosing too many contributors?
A: One too many contributors has little or no effect
on the LRs for the large donors (the major(s)). With
proper use of the informed priors function it can often
also have little effect on a minor. If there is an effect
on a positive LR for a minor it is downwards.
One too many contributors can produce low grade
adventitious matches with LRs near 1 from amongst
a large set of false donors.
53
Q: Identical twins...appear as a single source?
What about other close relatives?
A: Close relatives increase allelic overlap. This can
cause under assignment of NoC and can cause false
exclusions.
If one or more of the relatives can be “assumed” the
situation is much improved.
In artificial mother, father, child mixtures the
assumption of one parent and use of informed priors
gets the correct solution if the fraction of the child is
large enough.
54
Q: How to choose Hd? Is there only one possible
set of hypotheses?
A1: Too scientific: In theory It should be Hp vs
everything else. Practically all reasonable
propositions with a good posterior (LR x prior) need
to be considered.
A2: The defense are entitled to all reasonable
propositions. It is usually possible to safeguard the
interests of the defense by sensible choice of one or
a few propositions.
55
Q: What is confidence interval about LR estimates?
A: A philosophy available in STRmix™ and applied
by most laboratories worldwide is to assess all major
sources of uncertainty and make a rational
concession in the interests of the defense.
Most labs apply 3 or 4 layers of conservativism.
The population genetic model is conservativism at
the ratio of about 99:1
Most labs use a conservative value for theta
Most labs report the 99% lower bound on allele
frequency (technically probability) and MCMC
uncertainty.
Some labs also make a generous allowance for
relatives.
56
Q: What is highest LR you found in your validations
for a true non-contributor? (~500,000) ?
A: We had LR = 505,000 out of 28+ million false
donors in the Internal lab compilation. 703,000 in
Bright et al.
I cannot guarantee that these are the biggest ever
found.
In theory you could eventually get something really
big – say 1028 by chance. This is the correct result.
If you examined enough false donors, by random
simulation since this is more than there are people,
you would eventually get the right set of alleles.Searching mixed DNA profiles directly against profile databases
Jo-Anne Bright , Duncan Taylor, James Curran, John Buckleton
57
Q: Verbal predicates with LRs?
Q: If CI bridges 2 verbal predicates, which verbal
predicate do you use?
A: Lower
58
Q: Verbal predicates with LRs?
Q: If CI bridges 2 verbal predicates, which verbal
predicate do you use?
Arbitrary
Subjective
Strong for one person may be
moderate for another
If you just give the
number someone will
ask “What does this
mean?” and possibly
mention “we are lay
people.”
If you have to say
something why not get
organised in advance
Manage prosecutor’s
fallacy
59
Q: What constitutes an "intimate sample" for
purposes of assuming the presence of an individual's
DNA in a forensic mixture?
A: ASCLD have removed this statement
But you should assume someone when it increases
the Pr(E|Hd)
60
Q: Does STRMix impose binary structure on a continuous
biology???
Q: Is it correct to set a limit of 50rfu ..gives exaggerated
weight to a "homozygote" genotype, when one allele drops
below 50rfu...
A: We think some of the stability of STRmix’s
performance comes from not messing with the junk. Some
software appear to go astray with artifacts and small peaks.
We are considering automated spike and pull-up removal.
61
Assuming an incorrect number of contributors may
result in an inflated LR in favour of the prosecution
(provided the total number of contributors is the
same under both hypotheses).
62
Conservative
Non-conservative
Black edge means
realistic incorrect
63
Conservative
Non-conservative
Even the worst ones are
probably still within the
conservativism buffer