2nd microscopy congress: public archiving of bio-imaging data - perspectives, chalenges and outlook
TRANSCRIPT
![Page 1: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/1.jpg)
Public archiving of bio-imaging data – perspectives, challenges and outlook Ardan Patwardhan
![Page 2: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/2.jpg)
Outline• Introduction• EMDB and EMPIAR status• Resources for EMDB and EMPIAR• On-going projects, initiatives and plans
![Page 3: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/3.jpg)
Introduction
![Page 4: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/4.jpg)
Molecular and Cellular Structure
• Maintain and manage archives• PDB for atomic coordinate
models• EMDB for 3DEM
reconstructions• EMPIAR for 3DEM raw data
• Develop and maintain web-services – searching, visualisation and validation
• Facilitate community-wide initiatives
• Key themes – integration with other bioinformatics resources and imaging scales and validation
![Page 5: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/5.jpg)
Structural data archivesArchive Type of
dataFounded Organization Funding # people # entries Size
PDB Atomic coordinate models structures
1971 wwpdb (EBI, RCSB, PDBj, BMRB)
Core + grants
60-80 124286 1 TB (8 MB)
EMDB 3DEM volume structures
2002 EBI (+ RCSB, PDBj)
Core + grants
<10 4276 340 GB (80 MB)
EMPIAR Raw image data for EMDB structures
2014 EBI grant <5 61 40 TB (660 GB)
Stats until 9th Nov 2016
![Page 6: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/6.jpg)
What goes where...• Final single-particle and sub-tomogram average maps must
go to EMDB (tomograms strongly recommended)• Fitted models must go to PDB• Deposition of raw image data to EMPIAR is encouraged
EMDBFinal map
EMPIARRaw image data
PDBFitted model
![Page 7: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/7.jpg)
Benefits of public archiving• Reuse of data
• starting models• compare structures of different functional states• different emphasis may lead to new discoveries
• Validation, methods development, testing, training• Safe storage of data• Integration of data with other public archives• A resource for data mining• Enables a birds-eye perspective of the field
![Page 8: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/8.jpg)
What does archiving involve?• Working with the community, partners and
journals to achieve a consensus on practices, policies and procedures
• Adapting to changing needs of data and meta-data collection• new sample preparation methods• new validation methods
• Providing means to deposition data, e.g., web-based deposition systems
• Curating data – automated + manual, remediation• maximize structured annotation, minimize free-text
• Developing added value resources for searching, validating and visualizing data
![Page 9: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/9.jpg)
Viability• Community support• Value – uploads versus downloads• Data transfer technologies – Aspera, Globus• Data storage – file systems, object stores• Data fidelity – quality measures and validation• Annotation – structured versus unstructured• Centralised versus distributed
![Page 10: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/10.jpg)
EMPIAR• Electron microscopy pilot (or public?) image archive• Started in 2014• Raw 2D image datasets related to EMDB• Usage: validation, development, testing, teaching
and…• Safe storage of your data!• Was source for data in EM Map Validation Challenge
• Multi-frame micrographs, averaged micrographs, particle-stacks, tilt series
• Uses Aspera, Globus, ftp, http for data transfers
![Page 11: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/11.jpg)
Websites• emdb-empiar.org – EMDB website• empiar.org – EMPIAR website• pdbe.org – PDBe website• wwpdb.org – Coordinating organization for pdb
archive• emdatabank.org – EMDataBank NIH project
website• https://www.facebook.com/proteindatabank• https://twitter.com/pdbeurope
![Page 12: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/12.jpg)
EMDB and EMPIAR status
![Page 13: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/13.jpg)
EMDB trends – released entries
Stats until 2 Nov 2016
![Page 14: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/14.jpg)
EMPIAR metrics• Number of entries: 61 (40TB; average size ~ 650GB)• 7 TB+ sets; one 10TB+ dataset • Transfer speed: uploads 1-2 TB/24h (Europe, US, Australia)• “empiar” cited 20+ times in full-text open-access papers• Nature Methods publication (Iudin et al., 2016)
2014 2015 20160
0.51
1.52
2.53
3.54
Aspera uploads/month (users)
2014 2015 20160
0.5
1
1.5
2
2.5
3
Aspera uploads/month (TB)
2014 2015 201601020304050607080
Total downloads (users)
2014 2015 201605101520253035
Total downloads (data)
![Page 15: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/15.jpg)
Resources for EMDB and EMPIAR
![Page 16: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/16.jpg)
Searching EMDB - quick links + latest entries
emdb-empiar.org
![Page 17: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/17.jpg)
EMStats – journal stats
![Page 18: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/18.jpg)
Volume slicer• Available for all EMDB entries• Published in J Struct Biol (Salavert Torres et al., 2016)
emdb-empiar.org/emd-2363/3dslice
![Page 19: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/19.jpg)
EMPIAR website
empiar.org
![Page 20: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/20.jpg)
EMPIAR entry pages
empiar.org/empiar-10030
![Page 21: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/21.jpg)
EMPIAR API
empiar.org/api/entry/empiar-10004
![Page 22: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/22.jpg)
On-going projects, initiatives and plans
![Page 23: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/23.jpg)
Volume browser• Integrated visualisation of structural data• Spanning scales from cells to molecules
![Page 24: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/24.jpg)
Expert workshop on “3D segmentations and transformations - building bridges between cellular and molecular structural biology”
Madingley Hall, 6-7 Dec 2015
Co-funded by
![Page 25: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/25.jpg)
File format and translators• EMDB Segmentation File Format (EMDB-SFF)
• adds structured biological annotation• handles transforms between tomograms and subtomograms
• Python scripts to read Segger, IMOD and Amira and convert to EMDB-SFF
• Working on displaying segmentations in Omero• Public open source distribution through CCP-EM
![Page 26: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/26.jpg)
Future directions• Archiving for related imaging modalities including
• 3D scanning electron microscopy• correlative light and electron microscopy• soft X-ray tomography
• Data harvesting pipelines• Validation
• Deposition support for new kinds of validation data• Validation servers, e.g., for visual analysis, map versus model
FSC• Data-mining EMDB to develop new validation metrics
• Fast archive-wide sub-structure volumetric (or shape-based) searches
![Page 27: 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook](https://reader035.vdocuments.us/reader035/viewer/2022062905/586fe0ac1a28ab18428b7447/html5/thumbnails/27.jpg)
Acknowledgements• Gerard Kleywegt• EM group
• Sanja Abbott• Andrii Iudin• Paul Korir• Carlos Lugo• Eduardo Sanz Garcia• Jose Salavert Torres (UPV)• Ingvar Lagerstedt (EL)• Maya Holmdahl (UU)• Vladislav Lysenkov (MAMK)
• Birkbeck• Maya Topf• Agnel Praveen Joseph• Helen Saibil
• Baylor – Wah Chiu• RCSB – Cathy Lawson• Francis Crick
• Lucy Collinson
• Raffaella Carzaniga • STFC
• Martyn Winn• Tom Burnley
• Dundee• Jason Swedlow• Josh Moore
• CNB Madrid• Jose Maria Carazo• Pablo Conesa• Jose Miguel de la Rosa Trevin• Joan Segura Mora
• And many more!