analysis new developments and scientific case for offline ... · offline data analysis: why mongo...
TRANSCRIPT
![Page 1: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/1.jpg)
ISPyB Collaboration Meeting
New developments and scientific case for offline data analysis
Alex de Maria Antolinos and Gianluca SantoniESRF
![Page 2: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/2.jpg)
ISPyB Collaboration Meeting
New developments and scientific case for offline data analysis
Alex de Maria Antolinos and Gianluca SantoniESRF
![Page 3: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/3.jpg)
● Data Model Meeting at Soleil (September)● User Interface
○ Roadmap○ EXI○ EXI2
● Offline data analysis● Upgrade of JBoss Server● Single-Sign On
Overview
![Page 4: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/4.jpg)
Data Model Screening Tables
![Page 5: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/5.jpg)
Screening Tables
● Github issue #46 ○ https://github.com/ispyb/ispyb-database-modeling/issues/46
● Clean up old screening tables○ Unused columns○ Unused tables○ Refactor some tables
● Side effects:○ Changes might break compatibility with the old user interface !!○ Changes to be done on MxCube
● No deadline defined for this (waiting input for collaborators)
● Please have at look and make comments
![Page 6: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/6.jpg)
Screening tables
Current data model Proposed data model
![Page 7: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/7.jpg)
User Interface
![Page 8: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/8.jpg)
Roadmap
● EXI will be the official ISPyB UI at the startup 2020 at the ESRF
● Old User interface will be deprecated with no more support and maintenance
● EXI2 is being developed since April 2019
ISPyB (2004 -2020)
https://ispyb.esrf.fr https://exi.esrf.fr https://exi2.esrf.fr
EXI (2014 - ?)EXI2 (2019 - ?)
![Page 9: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/9.jpg)
Why do we need EXI2?
● It makes easier to developers to join the project○ It uses the same technologies like MxCube
○ Components can be reused between MxCube-EXI2
● Responsive (works in mobile devices)
● It is modular
Features:
● No major changes about how data is displayed (copy of EXI with some improvements)
● It includes offline data analysis
![Page 10: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/10.jpg)
Do you want to collaborate?
● Suggestions and ideas
● Specific requirements
● Follow-up of the project
● Active developments
● Testing
https://exi2.esrf.fr is pointing to the ISPyB instance of the ESRF but it could also point to your public ISPyB instances for you to test
![Page 11: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/11.jpg)
Migration Status from EXI to EXI2
EM
BioSAXS
MX
![Page 12: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/12.jpg)
Status for EM (90%)
![Page 13: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/13.jpg)
Status for EM (90%)
![Page 14: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/14.jpg)
Status for MX (50%)
![Page 15: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/15.jpg)
Status for BioSAXS (70% done)
![Page 16: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/16.jpg)
Status for BioSAXS (70% done)
![Page 17: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/17.jpg)
Conclusion
● EXI is being migrated to React smoothly● Fully migration expected by the end of 2020
○ No official deadline● EXI and EXI2 are compatible with your installed version of ISPyB● It is a good time to:
○ Provide feedback○ Help with the developments
■ Coding■ Testing
![Page 18: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/18.jpg)
Offline Data Analysis
![Page 19: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/19.jpg)
Offline Data Analysis: Goal
Goal: Users can launch predefined jobs from UI
● Decoupled architecture○ components to execute independently while still interfacing with each
other
● Flexible○ Easy to add new type of jobs○ Easy to maintain
● Allows interactive jobs
● Not specific to ISPyB
![Page 20: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/20.jpg)
Offline Data Analysis: architecture
EXI2
ISPyB
Analysis I. System
Presentation Layer Services Layer Processing Layer
BES
Get Data
Submit Job Process Job
Store result
Store result
![Page 21: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/21.jpg)
Offline Data Analysis: Implementation
● Information System for Analysis (ISA)○ NodeJS application (https://gitlab.esrf.fr/icat/is4a)○ Queueing system
● It does:○ Expose an API to
■ Store a job, input, output and status■ Publishes a catalogue of tools for a given entity e.g: data collection
○ Use a MongoDB for storing both data and metadata
● It does not○ run the jobs
![Page 22: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/22.jpg)
Offline Data Analysis: why mongo
● More versatile○ No schema constraints
● Good for prototyping
● GridFS○ storing large files may be more efficient in a MongoDB database than on a system-level filesystem.
● Apply a different data policy○ Data acquisition and online data analysis to be stored forever○ Offline data analysis might be removed after some time
● The use of a MongoDB does not prevent to store data on ISPyB if needed
![Page 23: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/23.jpg)
Job catalogue
IS4A
React-JSON-Schema
![Page 24: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/24.jpg)
Conclusion
● A mockup for offline data analysis has been developed● It allows store data and metadata from processing jobs● It is supposed to be versatile and easy to maintain● We are ready to test it with real use cases
○ Feedback is appreciated
● If anyone interested please contact us!
![Page 25: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/25.jpg)
Wildfly
● Upgrade to Wildfly version 18○ No big deal but some dependencies might change in the pom.xml
● Aiming to be backward compatible
● Aiming to upgrade to a recent version of Wildfly more frequently
8.2FROM
18.0TO
![Page 26: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/26.jpg)
Single-Sign on
![Page 27: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/27.jpg)
Single-Sign on
● Support○ standalone.xml to be modified○
● Advantages○ No specific code in ISPyB○ MxCube-ISPyB single-sign on○ Allows more ids
■ orcID■ UmbrellaId
![Page 28: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/28.jpg)
MxCube/ISPyB Meeting
Thanks!
![Page 29: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/29.jpg)
ISPyB Collaboration Meeting
Backup Slides
![Page 30: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/30.jpg)
Use case non-interactive
IS4A
Processing Software
PENDING RUNNING FINISHED
SUBMIT JOB
POOLING UPDATE STATUS PROCESSING UPDATE
STATUS
DISPLAY RESULTS
![Page 31: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/31.jpg)
Use case interactive
IS4A
Processing Software
PENDING RUNNING RUNNING
SUBMIT JOB
POOLING UPDATE STATUS PROCESSING UPDATE
STATUS
DISPLAY RESULTS
UPDATE STATUS
ON HOLDINPUT
REQUEST
DISPLAY INPUT
NEEDED
ON HOLDINPUT
PROVIDEDFINISHED
UPDATE STATUS
DISPLAY RESULTS
PROCESSING
POOLING
![Page 32: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/32.jpg)
IS4A Service
● NodeJS server● Mongo DB● Simple Restful API
![Page 33: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/33.jpg)
Datacollection placeholder
![Page 34: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/34.jpg)
Launch a job
1)Select items to process
2) Launch the job
Available set of tools for the selected items
![Page 35: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/35.jpg)
Run a job
![Page 36: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/36.jpg)
![Page 37: analysis New developments and scientific case for offline ... · Offline Data Analysis: why mongo More versatile No schema constraints Good for prototyping GridFS storing large files](https://reader033.vdocuments.us/reader033/viewer/2022052720/5f096a357e708231d426ba33/html5/thumbnails/37.jpg)
Run a job