ten years and change
DESCRIPTION
Ten Years and Change. the MX data archive at ALS 8.3.1. Acknowledgements. ALS 8.3.1 creator: Tom Alber 8.3.1 PRT head: Jamie Cate Center for Structure of Membrane Proteins Membrane Protein Expression Center II Center for HIV Accessory and Regulatory Complexes W. M. Keck Foundation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/1.jpg)
Ten Years and Change
the MX data archive at ALS 8.3.1
![Page 2: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/2.jpg)
AcknowledgementsALS 8.3.1 creator: Tom Alber 8.3.1 PRT head: Jamie Cate
Center for Structure of Membrane ProteinsMembrane Protein Expression Center II
Center for HIV Accessory and Regulatory Complexes
W. M. Keck FoundationPlexxikon, Inc.
M D Anderson CRCUniversity of California Berkeley
University of California San FranciscoNational Science Foundation
University of California Campus-Laboratory Collaboration GrantHenry Wheeler
The Advanced Light Source is supported by the Director, Office of Science, Office of Basic Energy Sciences, Materials Sciences Division, of the US Department of Energy under contract No. DE-AC02-05CH11231 at Lawrence Berkeley National Laboratory.
![Page 3: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/3.jpg)
ALS 8.3.1 data collection history
0
10
20
30
40
50
60
70
2001200220032004200520062007200820092010201120122013
actual
doubling = 2.8 years
tera
byte
s (u
ncom
pres
sed)
![Page 4: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/4.jpg)
ALS 8.3.1 data collection history
0
10
20
30
40
50
60
70
2001200220032004200520062007200820092010201120122013
Proteum 300
Q210
Q315 (907)
Q315r (926)
tera
byte
s (u
ncom
pres
sed)
![Page 5: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/5.jpg)
ALS 8.3.1 data collection history
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
2001200220032004200520062007200820092010201120122013
Proteum 300
Q210
Q315 (907)
Q315r (926)
imag
es x
106
![Page 6: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/6.jpg)
DVD data archive: 68 TB
![Page 7: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/7.jpg)
DVD data archive
![Page 8: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/8.jpg)
![Page 9: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/9.jpg)
![Page 10: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/10.jpg)
50 TB
![Page 11: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/11.jpg)
Primary failure mode of DVDs
![Page 12: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/12.jpg)
Primary failure mode of DVDs
3000 files remain unrecoverable (~0.1%)
![Page 13: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/13.jpg)
Which data go with which PDB?
• 260,000 images are called “test”
• cell: 48 62 84 90 101 104– is within 5 Å and 5° of 16,000 PDBs
focusing on 2001-2006
• 490 PDBs credit ALS 8.3.1 with data
• 44 of these didn’t actually collect data
• 64 collected data, but no credit
![Page 14: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/14.jpg)
1. images from 2001-2006
2. collected “near” edges
3. find “runs” of >10 images
4. unify multi-wedge sets
5. run labelit & XDS
6. >70% complete?
7. I/σ > 10
8. reduced cell vs PDB
1,604,031
682,712
3602
3331
2524
1479
1054
1 to 200+
Which data go with which PDB?
![Page 15: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/15.jpg)
Unit Cell: 90.9 90.9 46.8 90 90 120
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.00 0.50 1.00 1.50 2.00
best
Rcr
yst a
fter
rig
id-b
ody
refin
emen
t
RMS unit cell length deviation (Å)
1hh7 M. TB CSOR
1rb5
myoglobin
![Page 16: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/16.jpg)
MAD/SAD datasets
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.20 0.30 0.40 0.50 0.60
Ris
o vs
PD
B d
epos
it
best Rcryst after rigid-body refinement
Published
non-isomorphous
Unsolved?
![Page 17: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/17.jpg)
Responses to inquiries
“I have to find my old note book as I have no idea what that is.”
“I have changed jobs a few times since and am really far away from crystallography now.”
“Will see what I can find.”
“We solved it but never published it. Sorry!”
![Page 18: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/18.jpg)
EGDA
Dec 01 19:45:12 2001 egda46_*1_E#_###.img (1112 images, Se MAD)Dec 02 15:10:06 2001 egda27_*1_###.img (180, 1A, native?)Dec 02 19:21:55 2001 egdau1_*1_###.img (427, 8000eV (U?) SAD)Dec 02 20:58:26 2001 egdau1_*2_###.img (360, 8000eV (U?) SAD)Jun 01 14:07:43 2002 egda60_*1_###.img (360, Lutetium SAD)
“I think that these EGDA data sets are very likely some of xxx’s data sets, he was working on E.coli guanine deaminase, something he brought from yyy. No structure was ever published James, xxx was unable to solve the structure from these data.”
![Page 19: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/19.jpg)
~2.9 ÅP21212
R = 0.32Rfree = 0.39
PDB ID: ????
E. coliguaninedeaminase
![Page 20: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/20.jpg)
Metadata: can we rely on it?
Duquerroy, et al. (1994). "Lobster enolase crystallized by serendipity", Proteins: Struct., Funct., Bioinf. 18, 390-393.
authors were after lobsterarginine kinase
got enolase instead
arginine kinase structurestill unknown
![Page 21: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/21.jpg)
compresses 4.2x
raw image
![Page 22: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/22.jpg)
compresses 337x
just spots
![Page 23: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/23.jpg)
compresses 5x, but only one per dataset!
pixel-wisemedianacross
dataset
![Page 24: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/24.jpg)
compresses 3.5x
deviationfrom
median in“non-spot”
areas
![Page 25: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/25.jpg)
compressed ~50x
after h264of non-spot
areas
![Page 26: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/26.jpg)
compresses 5.2x
differencebetweenraw and
compressed
![Page 27: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/27.jpg)
Lossy compression vs R/Rfree
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
1 10 100
R_cryst
R_free
R f
acto
r
compression ratio
![Page 28: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/28.jpg)
backblaze.com “pod” server
backblaze.com offers “unlimited storage” data backup for $5/month.
![Page 29: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/29.jpg)
backblaze offers
“unlimited storage” data backup for
$5/month.
![Page 30: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/30.jpg)
backblazedoes not sellthese “pods”,but “protocase.com” does.
![Page 31: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/31.jpg)
![Page 32: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/32.jpg)
Summary
• saving data could double productivity
• unit cell is not a good score
• lossy compression: rallying cry?
• backup vs archive
• metadata: what do we really know?
![Page 33: Ten Years and Change](https://reader037.vdocuments.us/reader037/viewer/2022103007/568146ac550346895db3c7af/html5/thumbnails/33.jpg)
Brief Summary
• this is a lot of work.
• who is going to pay for it?