from purple prose to machine-checkable proofs by luther tychonievich
TRANSCRIPT
![Page 1: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/1.jpg)
From Purple Prose toMachine-Checkable Proofs:Levels of rigor in family history tools
Dr. Luther A. Tychonievich, Ph.D.Dept. Computer Science, University of Virginia
TSC Coordinator, Family History Information Standards Organisation
These slides: http://www.cs.virginia.edu/luther/RT2015.pdfMe: [email protected] or [email protected]
Luther Tychonievich 1 of 35
![Page 2: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/2.jpg)
Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond
Luther Tychonievich 2 of 35
![Page 3: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/3.jpg)
Two Definitions of Proof• Persuasive writing intended toconvince someone that an idea isbeyond rational doubt
• A set of derivations that reduce anon-obvious claim to a set of otherclaims and derivation rules
Luther Tychonievich 3 of 35
![Page 4: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/4.jpg)
Not Proof• Purple prose is language thatconceals a lack of substance behinda superfluity of description
• Structure can be that detail…• Try a genealogy search for Odin
• Potential for misuse ≠ bad
Luther Tychonievich 4 of 35
![Page 5: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/5.jpg)
Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond
Luther Tychonievich 5 of 35
![Page 6: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/6.jpg)
Reasons not to trustgenealogy
• Amateurs and fee-for-service• …the two least-trusted sources
• Individuals don’t yield to statistics• Linked generations compound error• Genes and probability offer littlehope
Luther Tychonievich 6 of 35
![Page 7: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/7.jpg)
Compound Error• Suppose each parent-child linkcorrect with 95% confidence• Great-grandparents = 14 links• 0.9514 < 49% chance all correct
• Doubling per-link confidence• Doubles links we can trust• Gives one extra generation
Luther Tychonievich 7 of 35
![Page 8: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/8.jpg)
Will genetics provide rigor?• Makes statistical claims aboutoverlap of ancestry of living people
• Illegitimacy and inbreeding• Biological vs Social parent• (see philogenetic tree research)
Luther Tychonievich 8 of 35
![Page 9: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/9.jpg)
Can probability provide rigor?• Probabilistic reasoning requiressome subset of• Priors• Experiments or Ground truth• Well-defined populations• Independent distributions
• Not a silver bullet…Luther Tychonievich 9 of 35
![Page 10: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/10.jpg)
Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond
Luther Tychonievich 10 of 35
![Page 11: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/11.jpg)
Pop Quiz• Which of the following is hardest?(1) Finding the right source(2) Citing the source right(3)Drawing the right conclusions(4) Communicating your reasoning
• Using tool , which is hardest?
Luther Tychonievich 11 of 35
![Page 12: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/12.jpg)
Analogy: Proof-Carrying Code• To prove code works is provably hard
• …or impossible (Rice’s Theorem)• To communicate a proof is easy
• …if you use the right tool
• Can we make the right tool forfamily history?
Luther Tychonievich 12 of 35
![Page 13: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/13.jpg)
Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond
Luther Tychonievich 13 of 35
![Page 14: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/14.jpg)
On the next slide we’ll see…• A recorder’s perspective of thelogical flow of genealogy
• Not a suggested research work-flow
Luther Tychonievich 14 of 35
![Page 15: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/15.jpg)
Luther Tychonievich 15 of 35
![Page 16: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/16.jpg)
• History happened, creating artifacts(documents, monuments, memories…)
• Not part of research; should notappear in tools
Luther Tychonievich 16 of 35
![Page 17: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/17.jpg)
• Step 1: discover artifacts• Digitization helps us all• Citations are getting better…• Logs (negative evidence) need work
Luther Tychonievich 17 of 35
![Page 18: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/18.jpg)
• Many tiers: shapes → glyphs →words → meaning → claims
• Single version = data pollution• Transcription community muchbetter at this than us
Luther Tychonievich 18 of 35
![Page 19: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/19.jpg)
• We don’t consciously consider everyclaim we see
• How do you detect, let alone store,subconscious selection?
• Computer could notice for us…
Luther Tychonievich 19 of 35
![Page 20: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/20.jpg)
• “Is that my John?”• Turns claims into people• The crux of genealogy; most errorsI’ve seen happen here
• Deserves a second slide…Luther Tychonievich 20 of 35
![Page 21: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/21.jpg)
• We need to1. record matches (and non-matches)2. allow changes3. record reason for each match4. embrace ambiguity
• More tools gaining 1, a few have 2; 3is unusual, 4 yet to appear
• Deserves a third slide…Luther Tychonievich 21 of 35
![Page 22: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/22.jpg)
• Match algebra:• A = B• A ≠ B• {A, B, C, D, …} ≥ 2 individuals
• Matches of any noun (person, place,event, etc.)
• Often evidence for A = B and A ≠ B
Luther Tychonievich 22 of 35
![Page 23: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/23.jpg)
• The “hidden” step today: from“name: Jane” we infer “gender:female” but don’t record that wemade an inference.
• Data simple, missing in our tools
Luther Tychonievich 23 of 35
![Page 24: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/24.jpg)
• The final belief is logicallyuninteresting, but socially vital
• Just one is a problem• Many tools store only beliefs andartefacts found
Luther Tychonievich 24 of 35
![Page 25: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/25.jpg)
Luther Tychonievich 25 of 35
![Page 26: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/26.jpg)
Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond
Luther Tychonievich 26 of 35
![Page 27: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/27.jpg)
Do we want rigorous tools?• Depends on the audience• Option of rigor seems good• Can we make a “proof-carryingGEDCOM”?
Luther Tychonievich 27 of 35
![Page 28: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/28.jpg)
To achieve rigor:• Embrace ambiguity
• 2+ conflicting versions of each step• Adopt best-practice transcription• Match algebra• Support reasons for matches• Add inferences explicitly
Luther Tychonievich 28 of 35
![Page 29: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/29.jpg)
…and beyond• Track the “select” step• Inference rules could allow
• translation-free rationales• statistically-sound probability
• Mid-research belief: A = B = C ≠ A• Time-varying rigor could berevealing data
Luther Tychonievich 29 of 35
![Page 30: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/30.jpg)
Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond• Extra Slides
Luther Tychonievich 30 of 35
![Page 31: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/31.jpg)
More on Inferences• Inference step:rule + antecedents = consequent
• Antecedents, consequent: claims• source of consequent: the inference• support of conseq.: the antecedents• rationale of consequent: the rule• Support may be enough…
Luther Tychonievich 31 of 35
![Page 32: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/32.jpg)
Inference “rule”s• A predicate over antecedents• A claim built from antecedents• Could be probabilistic (“trend” not“rule”)
• Consequent could be claim, match,non-match, etc.
• Encodes “why” as pure dataLuther Tychonievich 32 of 35
![Page 33: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/33.jpg)
More on Probability• Probability a claim is true: 0 or 1• Probability a rule’s consequent istrue: validated
validated + invalidated(approximate)
• Is “83% chance Jane is John’smother” useful?
• What about “99.7% chance eitherJane, Sue, or Anna is John’s mother”?
Luther Tychonievich 33 of 35
![Page 34: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/34.jpg)
Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond• Extra Slides
Luther Tychonievich 34 of 35
![Page 35: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich](https://reader031.vdocuments.us/reader031/viewer/2022032505/55c653c7bb61eb5d628b460e/html5/thumbnails/35.jpg)
Questions?
These slides: http://www.cs.virginia.edu/luther/RT2015.pdfMe: [email protected] or [email protected]
Luther Tychonievich 35 of 35