Structure Validation Challenges in Chemical Crystallography
Ton SpekUtrecht University,
The Netherlands.
Madrid, Aug. 26, 2011
Validation History• Structure Validation of data supplied in
computer readable CIF format was pioneered by Acta Cryst. C (Syd Hall et al., 1990ies).
• Initially the numerical checking of papers submitted to Acta C in CIF format was done by the Chester staff.
• Subsequently automated checking of the CIF for data consistency, data completeness and validity was introduced (checkCIF)
• PLATON facilities to check for Missed Symmetry and VOIDS were added later on.
• Soon followed by the inclusion of numerous other PLATON based tests (PLATxxx) of the reported structure (currently more than 400). checkcif/PLATON
FCF Validation• Fo/Fc reflection file deposition and archival
in CIF format (FCF) was made mandatory early on for Acta Cryst. papers.
• Useful for subsequent analysis of possibly unique data.
• CIF + FCF checking was added in 2010 into the IUCr CheckCIF/PLATON suite.
• Major chemical journals now require CIF deposition and validation reports but (not yet) the deposition of reflection data.
• The CCDC now accepts FCF's for deposition.
Why Automated Structure Validation
• The large volume of new and routine structure reports submitted for publication.
• The limited number experienced and available crystallographic referees for validation.
• Detection of errors due to the black box use of crystallography by non-crystallographers.
• Setting standards of quality and reliability.• Automated detection of unusual though not
necessarily erroneous issues that need special attention (ALERTS A,B,C,G).
• Sadly: The need to Detect Frauded structure reports.
Systematic Fraud• A massive fraud was detected in late 2009 of structures mainly
published around 2007 in Acta Cryst. E. (Soon 200 retractions !)• Nobody was prepared for serious and systematic fraud in this not
competitive field of routine structures before 2010.• Many deviations from the expected results can often be explained
as errors, inexperience or due to poor data.• Several retractions before 2010 might in hindsight concern frauded
structures and not errors.• Ongoing testing of our validation software on the archived data for
structures published in Acta E often indicated suspect structures needing a more detailed investigation.
• It was only by following up on one of such a strange structure report with an analysis of all structures published by the authors of that paper that a fraud pattern emerged.
• It was discovered that the same data set was used to publish a series if invented isomorphous structures.
• Full story: Acta Cryst. E (2010) editorial and a Powerpoint Presentation of the E-section editor Jim Simpson (IUCr Website).
BogusVariations (with Hirshfeld ALERTS) on the Published Structure 2-hydroxy-3,5-nitrobenzoic acid (ZAJGUM)
OH => F
H2O => NH3
OH=>NH2
NO2=>COOH
Fraud Detection Tools• Generalized Hirshfeld Rigid Bond Test.• CIF versus FCF data checking.• Scatter Plots of the reflection data of the same
or related structure(s).• Look in Difference Maps for unusual features.• SHELXL re-refinement using the supplied CIF &
FCF data.• Check in the CSD for related structures.• Two case studies that illustrate the use of the
above validation and analysis tools follow.
Example 1: Error or Fraud ?
Submitted to Acta Cryst. (2011)
Structure I
PLATON Report Part 1
PLATON Report Part 2
RELATED STRUCTURE FROM THE CSD
Structure II
Structure Report for II
Scatter Plots I(obs) versus I(calc)
(I)
(II)
Analysis
• Structure (II) has no validation issues.• C-CH3 distance in (II) of 1.50 Ang. as expected.• ‘C-F’ distance in (I) is 1.50 Ang. and not the
expected 1.35 Ang.• Conclusion: Structure (I) is the CH3 variety and
not F.• Data sets of (I) & (II) are not identical (see next).• Data set (I) likely based on CH3 compound.• Fraud or Error ? DIFABS file Error ?• Authors of (I) confirmed Error believing external
chemists proposal. Paper was retracted.
Scatter Plots of 2 Data Sets
Two Unrelated Data Sets
Two Identical Data sets
CIF versus FCF data Check
• The R & S values in the three lines # R= should be identical within rounding error.
• The reported and calculated residual density ranges should also be closely identical
• This is the case in the first example but not in the second where the CIF & FCF data do not match.
Example 2: Iron(III) Complex
Fe(III) Validation Part 1
Fe(III) Validation Part 2
Example 2: Difference Density Map
Fe Structure Re-refined
Conclusion ?• Structure now O.K. after an erratum ?• Search for similar (isomorphous)
structures in the CSD• Yes, there is an isomorphous Mn complex
published by a different set of authors from a different university.
• Let us compare both structures.
Isomorphous Mn(III) Complex
Mn Structure Validation Part 1
Mn Validation Part 2
Scatter Plot Fe versus Mn I(obs)
Fe and Mn Data Sets Identical !
Analysis on Fe/Mn Structures
• The Displacement parameters in the CIF for the H2O molecule in the Fe complex are different from those used in the final refinement.
• Reflection sets identical for papers from two different sets of authors and location.
• CSD: Unusual coordination distances• Fraud or Error ? • Withdraw/Retract one or both ?
Validation Challenges
• Avoid False Positive and Negative ALERTS• Disordered structures (true or artifact)• Handling of Twinning (data names missing)• Powder structure validation (experts needed)• Incommensurate structure validation (experts)• Fabricated reflection data – Can we detect them• Education – What is the meaning of an ALERT• Should validation criteria be different for
structures published in chemical journals ?
Concluding Remarks
• PLATON includes a standalone Validation Tool. It is part of the WEB-based IUCr CheckCIF/PLATON Tool that is capably managed by Mike Hoyland (IUCr)
• Validation is still a learning process.• Chemical insight might be very helpful and
often decisive as a validation tool.• Deposition of structure factors should be a
requirement for all journals (The CCDC now accepts those along with the CIF)
Thanks To
• Martin Lutz and many others for taking the time to bring various unresolved issues to my attention with actual data.
• Send to [email protected]