data analysis i19 upgrade workshop 11 feb 2014. overview short history of automated processing for...

24
Data Analysis I19 Upgrade Workshop 11 Feb 2014

Upload: barbara-miles

Post on 24-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Data Analysis

I19 Upgrade Workshop11 Feb 2014

Overview

• Short history of automated processing for Diamond MX beamlines

• Effects of adding Pilatus detectors• Current capabilities• Downstream analysis• Future developments• Changes for chemical crystallography• Benefits resulting from automated processing

Automated Processing at Diamond

• Automatic processing with xia2 since 2007/8 ish

• Took 10 – 20 minutes for “standard” ADSC Q315 data set

• At first users wanted to switch off automatic processing

• Complaints “xia2 is too slow!”• Also: warned of impending Pilatus 6M

Automated Processing at Diamond

• Wrote fast_dp – took experience from coding xia2 and fact XDS will work in parallel on a cluster

• Typically took <= 2 minutes for ADSC data set• When Pilatus arrived it took about the same

time with 5-10 x as many images • Which gets us to here…

Before the Pilatus Upgrade

• Phase 1 MX beamlines had ADSC Q315 detectors, typical readout time around 1s

• With typical exposure time would get 20 – 30 images (10 – 15 degrees) per minute

• 180 degree data set gave you time for (a cup of) tea – long data sets gave time for a meal

• Manual data processing could keep up with collection

After the Pilatus Upgrade

• Steady data collection speed will give 180 degrees of data in 3 minutes – much faster is possible (less than 1 minute)

• Fast sample changer much more important – can now change samples in ~ 40s

• Possible to record 12 – 15 data sets / hour• Throughput potentially more than doubled

What does this mean?

• Keeping up with data collection manually close to impossible

• Pushes pressure on downstream analysis even with automated data processing

• fast_dp is now critical to get timely results• Databases more important for tracking results• No longer will you have time for a cup of tea

A sign of the times…

What else does this mean?

• Short shifts / remote access become useful (Dave will talk about this later)

• Speculative data collection now more sensible – if you are not sure whether to collect or no

• You need to bring a bigger hard drive

Current Capabilities (on MX)

• Per-image analysis• fast_dp – get feedback on your data collection

within 2 minutes of the experiment• xia2 – more comprehensive data processing• fast_ep – based on fast_dp output, try

experimental phasing• dimple – based on fast_dp output, try

searching for ligands

Per image analysis

------------------------------------------------------- Low resolution 28.89 28.89 1.36 High resolution 1.33 5.95 1.33 Rmerge 0.061 0.024 0.438 I/sigma 13.40 46.30 1.40 Completeness 99.4 98.9 94.5 Multiplicity 5.2 5.0 2.6 Anom. Completeness 95.2 98.9 67.8 Anom. Multiplicity 2.6 3.1 1.1 Anom. Correlation 99.9 99.9 73.1 Nrefl 308449 3996 10729 Nunique 59097 800 4081 Mid-slope 1.032 dF/F 0.077 dI/sig(dI) 0.860-------------------------------------------------------Merging point group: P 4 2 2Unit cell: 57.78 57.78 150.01 90.00 90.00 90.00Processing took 00h 01m 26s (86 s) [308449 reflections]

But not just data processing

• Collection of screening images will result in strategy calculations

• Fluorescence scans are analyzed automatically

In summary

• Merging statistics from quick (but reasonable) processing in ~ 2 minutes

• Maps possible within ~ 3 – 4 minutes• Automatic strategies can guide data collection• Everything tracked in the database

Future developments

• Better handling of weak data / tiny crystals• Room temperature / in situ data collection• Pushing algorithm development• …

Differences for Chemical Crystallography

• All sets consist of multiple sweeps• Indexing harder as data sparse – for strategy /

screening and processing• Scaling more important due to absorption

effects (normally)• Many more spacegroups to consider• Strategies more complex• Downstream analysis perhaps more tractable

Benefits from automated analysis

• Close to real-time feedback on data collection• Allows you to focus on experimental results

not driving GUI’s (for processing)

Trouble for the users

• Data collection less cosy – no time for tea! no time to inspect every image! not enough time to process data by hand…

• Need to bring more samples (this surprised MX users for a while)

• You need to be more organized• You need a bigger hard drive

Upshot… you will get