infso-ri-508833 enabling grids for e-science na4/biomed demonstration medical data management and...
TRANSCRIPT
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
NA4/Biomed DemonstrationMedical Data Management and processing
EGEE 3rd review rehearsal, May 4th, 2006
Johan Montagnat
Tristan Glatard
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 2
Enabling Grids for E-sciencE
INFSO-RI-508833
Demonstration content
• Medical Data Manager– Interface to clinical data storage (DICOM)
– Integrated to gLite 1.5 middleware
– Tackling data security and privacy needs
– Result of MDM TCG Working Group
• Application to medical images registration assessment– Data intensive workflow-based application
– Scientific results in the medical image processing area with consequences for clinical use
• Immediate scheduling of jobs submitted for the demonstration– Torque+MAUI configuration for efficient handling of short jobs
– Result of the SDJ TCG Working Group
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 3
Enabling Grids for E-sciencE
INFSO-RI-508833
Medical Data Manager• Objectives
– Expose an standard grid interface (SRM) for medical image servers (DICOM)
– Fulfill application security requirements without interfering with clinical practice
DICOM server
gLiteIOserver
SR
M-D
ICO
Min
terf
ace
AMGA Metadata
User Interface
Worker Node
HydraKey store
The MDM componentsDICOM clients
FiremanFile
Catalog
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 4
Enabling Grids for E-sciencE
INFSO-RI-508833
Interfacing sensitive medical data
• Computing interface to medical DICOM storage– Data are acquired from the hospital imagers in native DICOM
format
– Standard SRM interface exposed to the grid
– DICOM slices are assembled in 3D images
• Privacy– Fireman provides file level ACLs
– gLiteIO provides transparent access control
– AMGA provides metadata secured communication and ACLs
– SRM-DICOM provides on-the-fly data anonimization It is based on the dCache implementation (SRM v1.1)
• Data protection– Hydra provides encryption/decryption transparently
gLite 1.5 servicegLite 1.5 service
gLite 1.5 service
ARDA service
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 5
Enabling Grids for E-sciencE
INFSO-RI-508833
Medical Data Registration
DICOM server
AMGA Metadata
HydraKey store
gLiteIOserver
1. Image is acquired
2. Image is stored in DICOM server
3. glite-eds-put
3a. Image is registered (a GUID is associated)
3b. Image keyis produced andregistered
4. image m
etadataare registered
FiremanFile
Catalog
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 6
Enabling Grids for E-sciencE
INFSO-RI-508833
Medical Data Retrieval
DICOM serverS
RM
-DIC
OM
inte
rfac
e
AMGA Metadata
UserInterface
Worker Node
HydraKey store
gLiteIOclient 2. glite-eds-get
3. get SURL from GUID
4. request file
5. get file key
6. on-the-fly encryption and anonimyzation
return encrypted file
7. get file key and decrypt file locally
Metadata ACL control
Key ACL control
Anonimization & encryption
In-memorydecryption
1. get GUID from metadata
gLiteIOserver
FiremanFile
CatalogFile ACL control
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 7
Enabling Grids for E-sciencE
INFSO-RI-508833
Data replication and retrieval usecase
gLiteIOserver
DICOM server
SRM-DICOMinterface
User Interface Worker Node
HydraKey store
gLiteIOclient
gLiteIOclient
Any SE
SRM interface
Anyfile
1. Requestreplication 6. Request file
2. Get SURL
3. Get file 8. Get file
4. Get file key
5. (encrypted) File replication
7. Get SURL
9. Return (encrypted) file
gLiteIOserver
FiremanFile
CatalogFile ACL control
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 8
Enabling Grids for E-sciencE
INFSO-RI-508833
Bronze Standard application
• Medical image registration algorithms assessment– Registration needed in many clinical procedures
– Real clinical impact
• Interfaced to the medical data manager– To retrieve suitable input images
• Compute intensive– Medical image registration algorithms: minutes to hours of
computations on PCs
• Data intensive– Hundreds to thousands of image pairs
• Workflow-based– Using the MOTEUR service-based workflow manager
– Developed in the French ACI “Masse de données” AGIR project
No execution infrastructuregLite 1.5 phased out
May 4, 2:30pm updateB-plan should work:
- pre-install glite 1.5 DMS on prod.- Use production infrastructure for the
demo
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 9
Enabling Grids for E-sciencE
INFSO-RI-508833
After registrationBefore registration
Image Registration
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 10
Enabling Grids for E-sciencE
INFSO-RI-508833
Variability of a registration algorithm
Registration algorithm
Final transformation
External parameters
• Data (image) 1
• Data (image) 2
• Acquisition noise
• Patient effects
Varying internal parameters
• Initial transformation
• (…)
• Robustness: ability to find the right transformation (success/failure)
• Repeatability: w.r.t. some parameters (e.g. initialization)
• Accuracy: Variability w.r.t. the ground truth for typical data
Fixed internal parameters
• Multiscale resolution
• (Typical variance…)
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 11
Enabling Grids for E-sciencE
INFSO-RI-508833
Performance Evaluation without Gold Std
• Bronze standard: The exact result is an unknown variable
• Unbiased estimation: use redundant information
– use many different registration algorithms
(average biases, so that precision ~ accuracy)
– Use many different data (redundant information to ensure precision)
– Average transformations (maximal consistency)
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 12
Enabling Grids for E-sciencE
INFSO-RI-508833
Bronze Standard workflow
transrotjiT ,,,
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 13
Enabling Grids for E-sciencE
INFSO-RI-508833
Data-intensive service-based applications
• Service-based approach versus task-based approach
Service0
Service1 Service2
Service3
input0
4 instances Job0
Job1 Job2
Job3
Graph of services DAG of tasks
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
Job0
Job1 Job2
Job3
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 14
Enabling Grids for E-sciencE
INFSO-RI-508833
Data composition strategies
• Data composition patterns : data intensive applications– One-to-one All-to-all
– In our case: register all images of
the same patient
the same modality
A different exam date
Set 0 Set 1
I0
J0
I1
J1
I2
J2
Set 0 Set 1
I0
J0
I1
J1
I2
J2
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 15
Enabling Grids for E-sciencE
INFSO-RI-508833
workflowmanager
MOTEUR services orchestration
EGEEUser Interface
EGEEResources
Input 0
Service B
Output 0
Input 0 Input 1
Service A
Output 0
Data 0
Img Ref 0
Img Ref 0Img Ref 0 Img Ref 0
Img Ref 0Data 1
Img Ref 1
Img Ref 1
Img Ref 1Img Ref 1
Img Ref 1
Img Ref 1Img Ref 1
Img Ref 1Data 2
Img Ref 2
Img Ref 2Img Ref 1Img Ref 2Img Ref 2
Img Ref 2
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 16
Enabling Grids for E-sciencE
INFSO-RI-508833
workflowmanager
Legacy code encapsulation
Input 0 Input 1
Generic service
Output 0
Input 0 Input 1
Generic service
Output 0
Algorithm parametersdescription
Algorithm parametersdescription
Legacy code 2
Legacy code 1
<description> <executable name="CrestLines.pl"> <access type="URL"> <path value="http://colors.unice.fr:80/"/> </access> <value value="CrestLines.pl"/> <input name="image" option="-im1"> <access type="LFN" /> </input> <input name="scale" option="-s"/> <output name="crest_lines" option="-c2"> <access type="LFN" /> </output> <sandbox name="convert8bits"> <access type="URL"> <path value="http://colors.unice.fr:80/"/> </access> <value value="Convert8bits.pl"/> </sandbox> </executable></description>
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 17
Enabling Grids for E-sciencE
INFSO-RI-508833
Short Deadline Jobs
• Torque + MAUI specific configuration– Virtual processors allocation
– Does not interfere with normal batch scheduling (shared processor time)
– Enables efficient processing of short tasks on the production infrastructure
– Ersatz for lack of jobs prioritization
• Special submission queues– Three SDJ queues deployed on biomed-compliant sites
– Time-limited queues
• Submit-or-reject paradigm– Jobs are immediately executed or rejected if a too high number of
short jobs are already executing.
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 18
Enabling Grids for E-sciencE
INFSO-RI-508833
Workflow execution
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 19
Enabling Grids for E-sciencE
INFSO-RI-508833
Post-mortem trace
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 20
Enabling Grids for E-sciencE
INFSO-RI-508833
Scientific production
• 4 rigid-registration algorithms precision estimated on brain image database
• To be published in [HealthGrid’06]
Algorithm reg (deg) trans (mm)
CrestMatch
0.150 0.424
PFRegister
0.180 0.416
Baladin 0.139 0.395
Yasmina 0.137 0.445
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 21
Enabling Grids for E-sciencE
INFSO-RI-508833
Why grids?
• From days to hours– 10s to 100s of algorithms
To adapt to many clinical cases
– Virtually illimited parameterization
– Virtually illimited number of image databases Different modalities, different body regions
• Complex computation procedure– Difficult experimental set up
– Future plan: application portal
• Data federation– Obtain data sources needed for validation
• Algorithms sharing– Use registration services developed in different research groups
– Reproducible results
NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 22
Enabling Grids for E-sciencE
INFSO-RI-508833
Conclusions
• Medical data management– Advanced Data management functionalities
– Application area-level layer on top of foundation middleware
– Dependent on the deployment of gLite 1.5 services
• Bronze Standard application– Complex, workflow-based application
– Data intensive Non-trivial parallel computations Data federation using grid data management services
– Production of scientific results
• Short deadline jobs– Immediate scheduling of short tasks
– Submit-or-reject paradigm