data analysis i19 upgrade workshop 11 feb 2014. overview short history of automated processing for...
TRANSCRIPT
Overview
• Short history of automated processing for Diamond MX beamlines
• Effects of adding Pilatus detectors• Current capabilities• Downstream analysis• Future developments• Changes for chemical crystallography• Benefits resulting from automated processing
Automated Processing at Diamond
• Automatic processing with xia2 since 2007/8 ish
• Took 10 – 20 minutes for “standard” ADSC Q315 data set
• At first users wanted to switch off automatic processing
• Complaints “xia2 is too slow!”• Also: warned of impending Pilatus 6M
Automated Processing at Diamond
• Wrote fast_dp – took experience from coding xia2 and fact XDS will work in parallel on a cluster
• Typically took <= 2 minutes for ADSC data set• When Pilatus arrived it took about the same
time with 5-10 x as many images • Which gets us to here…
Before the Pilatus Upgrade
• Phase 1 MX beamlines had ADSC Q315 detectors, typical readout time around 1s
• With typical exposure time would get 20 – 30 images (10 – 15 degrees) per minute
• 180 degree data set gave you time for (a cup of) tea – long data sets gave time for a meal
• Manual data processing could keep up with collection
After the Pilatus Upgrade
• Steady data collection speed will give 180 degrees of data in 3 minutes – much faster is possible (less than 1 minute)
• Fast sample changer much more important – can now change samples in ~ 40s
• Possible to record 12 – 15 data sets / hour• Throughput potentially more than doubled
What does this mean?
• Keeping up with data collection manually close to impossible
• Pushes pressure on downstream analysis even with automated data processing
• fast_dp is now critical to get timely results• Databases more important for tracking results• No longer will you have time for a cup of tea
What else does this mean?
• Short shifts / remote access become useful (Dave will talk about this later)
• Speculative data collection now more sensible – if you are not sure whether to collect or no
• You need to bring a bigger hard drive
Current Capabilities (on MX)
• Per-image analysis• fast_dp – get feedback on your data collection
within 2 minutes of the experiment• xia2 – more comprehensive data processing• fast_ep – based on fast_dp output, try
experimental phasing• dimple – based on fast_dp output, try
searching for ligands
------------------------------------------------------- Low resolution 28.89 28.89 1.36 High resolution 1.33 5.95 1.33 Rmerge 0.061 0.024 0.438 I/sigma 13.40 46.30 1.40 Completeness 99.4 98.9 94.5 Multiplicity 5.2 5.0 2.6 Anom. Completeness 95.2 98.9 67.8 Anom. Multiplicity 2.6 3.1 1.1 Anom. Correlation 99.9 99.9 73.1 Nrefl 308449 3996 10729 Nunique 59097 800 4081 Mid-slope 1.032 dF/F 0.077 dI/sig(dI) 0.860-------------------------------------------------------Merging point group: P 4 2 2Unit cell: 57.78 57.78 150.01 90.00 90.00 90.00Processing took 00h 01m 26s (86 s) [308449 reflections]
But not just data processing
• Collection of screening images will result in strategy calculations
• Fluorescence scans are analyzed automatically
In summary
• Merging statistics from quick (but reasonable) processing in ~ 2 minutes
• Maps possible within ~ 3 – 4 minutes• Automatic strategies can guide data collection• Everything tracked in the database
Future developments
• Better handling of weak data / tiny crystals• Room temperature / in situ data collection• Pushing algorithm development• …
Differences for Chemical Crystallography
• All sets consist of multiple sweeps• Indexing harder as data sparse – for strategy /
screening and processing• Scaling more important due to absorption
effects (normally)• Many more spacegroups to consider• Strategies more complex• Downstream analysis perhaps more tractable
Benefits from automated analysis
• Close to real-time feedback on data collection• Allows you to focus on experimental results
not driving GUI’s (for processing)
Trouble for the users
• Data collection less cosy – no time for tea! no time to inspect every image! not enough time to process data by hand…
• Need to bring more samples (this surprised MX users for a while)
• You need to be more organized• You need a bigger hard drive