metrology for identity and other nominal properties
TRANSCRIPT
Metrologyfor Identity and Other Nominal Properties
David Lee DuewerChemical Sciences Division
Materials Measurement LaboratoryNational Institute of Standards and Technology
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
And we take ourselves very seriously…When I Say “We”…
PhD 1985 Analytical chemist 5 y Perkin-Elmer – Instrument Design/Development
24 y NIST “Innovator”
PhD 1976 Analytical chemist11 y Monsanto - process & biodiscovery
23y NIST “Data Jock”
Marc Salit Dave DuewerLeader,
Genome Scale Measurements Group
Co-Director, NIST/Stanford U.Joint Initiative on Measurements in Biology
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Metrology (Measurement Science)
• Metrology is the stuff needed so data can support informed decision making. • in a good world, decisions are informed with data
• which are the results of measurements!
• Calculus of Confidence• we posit that metrology is the ‘formal’ system
that tells us how well we trust those data
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Calculus of Confidence
• The tools of metrology:• Traceability
• Uncertainty
• Validation
• enable this calculus of confidence by which decisions are informed by measurement results with established confidence.
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Craft
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
• Metrology is more a craft than a technology• this doesn’t mean that 7 year apprenticeships are
required!
• it does mean that two different skilled metrologistsmight take very different approaches to the same problem• but they should both come to largely equivalent
solutions!
• matter of style
• must be defensible
The “How Much” Worldviewas seen by chemists/biochemists
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Tools of the Trade
Workshop on DNA Methods for Quality Control of Botanical Products USP, 23-Oct-2014
www.bipm.org/en/publications/guides/#vimwww.nist.gov/pml/pubs/sp811/www.bipm.org/en/publications/guides/#gum
“GUM” “VIM”
Metrological Traceabilityenables comparisons to be made over time and place
SI unit(amount of substance)
purity analysis
Result
primary methods
reference methods
routine methods
high purity primary RM
primary calibration CRM
secondary calibration RM
routine sample
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Validationensures measurement processes are well-understood
• “checks the measurement model”• tests completeness
• tests assumptions
• helps establish an uncertainty budget
• identifies relevant parameters to keep under control
• tests scope
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
• “how much” results are only useful when compared• different results in different places or
measured at different times…• “comparability over space-and-time”
• Are these results the same?• is there significant bias?
• Is measurement precision fit-for-purpose
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Metrological Uncertaintyenables meaningful comparison of results
“We think our reported value is good to 1 part in 10,000: we are willing to bet our own money at even odds that it is correct to 2 parts in 10,000. Furthermore, if by any chance our value is shown to be in error by more than 1 part in 1000, we are prepared to eat the apparatus and drink the ammonia.”
Perhaps NIST’s Best Uncertainty Statement
Quote from: Doiron T and Stoup J, Uncertainty and Dimensional Calibrations, JNIST 1997;102:647-676
http://dx.doi.org/10.6028/jres.102.044
Dr. C.H. Meyers, on his measurements of the heat capacity of ammonia (circa 1920):
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
The “What” Worldview
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Several Different “What”s• Identification
• “Pure substance” Certified Reference Material (CRM)• Use/develop convincingly specific methods
• Inclusion• exclusion
• Define and certify unambiguous “barcode”• CRMs are expensive
• Verification• Secondary reference materials (RMs) and controls• Check “barcode” against CRM• Can be commercial or home-brew
• Recognition• Component of a mixture• Check “barcode” against library
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Barcode of Life
http://www.barcodeoflife.org/content/about/what-dna-barcoding
Identification Validation
Recognition
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Metrological Traceabilityenables comparisons to be made over time and place
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Authoritychemical structure, biological nomenclature
identification methods
Result
verification methods
recognition methods
“pure” primary RM
QC and secondary RMs
routine samples
{CAS, IUPAC} {ICZN, ICN}
Taxonomic Hierarchy
Ginkgo biloba L.
Kingdom Plantae – plantes, Planta, Vegetal, plants Subkingdom Viridaeplantae – green plants
Infrakingdom Streptophyta – land plants Division Tracheophyta – vascular plants, tracheophytes
Subdivision Spermatophytina – spermatophytes, seed plants, phanérogamesInfradivision Gymnospermae – gymnosperms, gymnospermes, gimnosperma
Class Ginkgoopsida – ginkgo Order Ginkgoales
Family GinkgoaceaeGenus Ginkgo L. – ginkgo
Species Ginkgo biloba L. – maidenhair tree, common ginkgo
en.wikipedia.org/wiki/Ginkgo_biloba
http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=183269
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Validationensures measurement processes are well-understood
• “checks the measurement model”• tests if identification criteria fit-for-purpose
• includes everything wanted
• excludes everything else• (Ideally, this can be done in silico)
• tests if measurements consistent with identification criteria
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Specificity Validation DesignChloroplast DNA sequences from authenticated Ginkgo biloba samples are used to establish inclusivity
Chloroplast DNA sequences from close relatives are used to establish exclusivity
Labudde, R.; Harnly, J.M.; Probability of identification (POI): A Statistical Model for the Validation of Qualitative Botanical Identification Methods
Official Methods of Analysis of AOAC International., Vol. 95, pp. 273–285, (2012).
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
https://www-s.nist.gov/srmors/view_cert.cfm?srm=3246
psbA-trnH Intergenic Spacer Phylogeny trnL Intron Phylogeny
Specificity Validation Results
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
https://www-s.nist.gov/srmors/view_cert.cfm?srm=3246
• “what” results are only useful when• The same “things” can be compared
• “measurand” is the metrology-speak term
• Are these barcodes the same?• how confident are you in the result?
• essential part of being able to compare!
Metrological Confidenceenables meaningful interpretation of results
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
“Where uncertainty is assessed qualitatively, it is characterised by providing a relative sense of the amount and quality of evidence (that is, information from theory, observations or models indicating whether a belief or proposition is true or valid) and the degree of agreement… This approach is used by WG III through a series of self-explanatory terms such as: high agreement, much evidence; high agreement, medium evidence; medium agreement, medium evidence; etc.”
Defining “Confidence”
Climate Change 2007: Synthesis Reportwww.ipcc.ch/publications_and_data/ar4/syr/en/contents.html
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
“Confidence”: NIST’s Initial DefinitionsDNA Sequence
via Sanger sequencing
Workshop on DNA Methods for Quality Control of Botanical Products USP, 23-Oct-2014
On Further Thought…
• Highest confidence• sufficient evidence• no ambiguities or contradictions
• Very confident• sufficient evidence• all ambiguities unambiguously resolved
• Confident• sufficient evidence• all ambiguities “understood”
• but insufficient evidence to prove it
• Insufficient evidence to Certify
Acquire Evidence
Sufficient?
HighestUnambiguous?
Resolved? Very
Understood? Confident
Yes
Yes
Yes
Yes
No
No
No
No Confidence
Maybe
No
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Who Defines “Sufficient”?
You!and the rest of the experts within your community
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Criteria for Identification of Seized Drugs
SWGDRUG Recommendations :If one technique from A, then one other (A, B, or C).If no techniques from A, then three others (two from B).
Category A Category B Category C
Infrared Spectroscopy Capillary Electrophoresis Color Tests
Mass Spectrometry Gas Chromatography Fluorescence Spectroscopy
Nuclear Magnetic Resonance Spectroscopy Ion Mobility Spectrometry Immunoassay
Raman Spectroscopy Liquid Chromatography Melting Point
X-ray Diffractometry Microcrystalline Tests Ultraviolet Spectroscopy
Pharmaceutical Identifiers
Thin Layer Chromatography
http://www.swgdrug.org/approved.htm
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Barcode of Life: Standards and Guidelines
www.barcodeoflife.org/content/resources/standards-and-guidelines
2.D.ii In November 2009, CBOL approved rbcL and matK as the barcode regions for vascular plants. They are defined relative to the Arabidopsis thaliana chloroplast NC_000932 sequence annotation as follows: the rbcL barcode region is at the 5' end of the rbcL gene between bp1-599 (27-579 excluding primer sequences); the matK barcode region is between bp205-1046 (227- 1019 excluding primer sequences).
4.C In deciding whether a record will be repeatable and reliable for species identification, submitters should select as potential BARCODE records only those for which the contigwas based on bi-directional coverage with non-N base calls at no less than 40% of the reported sequence. As described below (5D), CBOL can direct GenBank (or another INSDC member) to remove the BARCODE designation from records which have all required elements (1A-I) but have been shown to be unreliable for species identification due to low sequence quality and coverage.
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Recent Work in “What” Metrology
Chemical Identification and its Quality AssuranceBoris L. MilmanD.I. Mendeleyev Institute for Metrology, St. Petersburg, Russia
January 12, 2011 Springer, 281 pages, English
“Unlike analytical techniques for qualitative and quantitative determinations, well-presented in books and reviews, theoretical principles of identification and general experimental approaches to its implementation have not received comprehensive treatment in the literature.”
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Standards for Pathogen Identification via Next-Generation Sequencing Workshop NIST, 20-Oct-2014
Thank you for your attention!