egee-iii infso-ri- 222667 enabling grids for e-science the medical data manager : the components...

26
EGEE-III INFSO-RI- 222667 Enabling Grids for E-sciencE www.eu-egee.org The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan Glatard CNRS, I3S laboratory

Upload: dominick-boyd

Post on 31-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

EGEE-III INFSO-RI-222667

Enabling Grids for E-sciencE

www.eu-egee.org

The Medical Data Manager :the components

Johan Montagnat, Romain Texier, Tristan GlatardCNRS, I3S laboratory

Medical Data Manager, R. Texier, July 16, 2008 2

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

EGEE Medical Data Manager

• Objectives– Expose a standard grid interface (SRM) for medical image

servers (DICOM)– Use native DICOM storage format– Fulfill medical applications security requirements– Do not interfere with clinical practice

User Interfaces

Worker Nodes

DICOM clients

DIC

OM

Inte

rfac

eS

RM

DICOM server

Medical Data Manager, R. Texier, July 16, 2008

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Medical data protection• Content

– Medical images (data, confidential)– Patient folder (attached metadata, very sensitive)

• Requirements– Patient privacy

Needs fine access control (ACLs on all data and metadata) Needs metadata contention (metadata databases administrated by

accredited staff)

– Data protection Needs data encryption (even grid sites administrators are not

accredited to access the data)

• How important it is?– The medical community will just not use a system in which they

are not trustful (both a technical and a human problem)

Medical Data Manager, R. Texier, July 16, 2008 4

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

MDM main Components

• Usability– LFC API provides transparent access

• Privacy– LFC and DPM provide file level ACLs

– AMGA provides metadata secured communication and ACLs

• Data protection– SRM-DICOM provides on-the-fly data

anonimization DPM-based (SRM v2 interface)

– Hydra key store provides encryption / decryption transparently

– Data is anoymized prior to transmission

LFC

AMGA Metadata

SRM-DICOMInterface

Medical Data Manager, R. Texier, July 16, 2008 5

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Exploiting DPM extensibility

• DPM can access different storage back-end through plugins– The DPM-DICOM plugin prepares the file

• DPM exposes a standard Storage Element interface (SRM)• DPM provides standard file exchange protocols, file

access control

DICOM

GET

DPM

head

DPM

Disks pool

Standard interface

File

retrieval

DPM-DICOM

Plugin

DPM-DICOM

Library

Temporary copy

SFN request

Medical Data Manager, R. Texier, July 16, 2008 6

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Medical Data Registration

AMGA Metadata

gLite

API

1. Image is acquired

2. Image is stored in DICOM server

3. gLite client

3a. Image is registered

(a GUID is associated)3b. Image keyis produced andregistered

4. image m

etadataare registered

LFC

DICOM serverDPM

File Catalog

Hydra keystore

Medical Data Manager, R. Texier, July 16, 2008 7

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Medical Data Registration

AMGA Metadata

LFC

API

1. Image is acquired

2. Image is stored in DICOM server

3. gLite client

3a. Image is registered

(a GUID is associated)3b. Image keyis produced andregistered

4. image m

etadataare registered

LFC

DICOM serverDPM

File Catalog

– All this step can be done by a single CLI

– A DICOM transaction can initiate the registration

PUSH

DICOM

Triggers:

DICOM server PUSHMDM registraiton

Medical Data Manager, R. Texier, July 16, 2008 8

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Registration in Hydra

• Each DICOM image is uniquely identified by a unique Study/Serie/SOP identifier

• The hydra servers generate a key for the selected cypher

• The cypher and the key are associated to the unique DICOM identifiers

analyzeStudy ID

Series IDSOP ID

Select a cypher

and generate a key

DICOM image

Hydra

servers

Medical Data Manager, R. Texier, July 16, 2008 10

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

File identifiers registration

• A reference to a file is recorded in the DPM, but no copy of the file in the DPM disk pool is needed

• Directories with the Study, Series and SOP identifiers are created in the LFC

• The anonymized data fields are registered in the AMGA server

- SURL and PFN

- the size of the file- host of the disk pool- ...

- LFN and SURL

- size of the file- DICOM image

metadata

DPM LFC AMGA

Medical Data Manager, R. Texier, July 16, 2008 11

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Access control rights management

• To allow one user to access a medical file and its metadata the owner of the file must set the right in all the component :

• Example:

LFC

DPM

Hydra

AMGA

Medical Data Manager, R. Texier, July 16, 2008 12

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Medical Data Retrieval

SR

M-D

ICO

M

inte

rfac

e

AMGA Metadata

User

Interface

Worker Node

2. lcg client

3. get SFN from GUID

4. request file

5. get file key

6. on-the-fly encryption and anonimyzation

return encrypted file

7. get file key and decrypt file locally

Metadata ACL control

Anonymization & encryption

1. get GUID from metadata

gLite

API

LFCFile ACL control

File Catalog

Key ACL control

Hydra keystore

Medical Data Manager, R. Texier, July 16, 2008 13

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Dicom retrieval : get the dicom file

DPM

SURL request DPM-DICOM

Library

DPM-DICOM

Plugin

• The PFN associates with a DICOM file is resolved by the DPM-DICOM plugin

• The plugin makes a DICOM transaction with the DICOM server to retrieve the medical image

• By default, MDM is packaged with the Conquest DICOM server, but it is intended for interface to production servers

The database

assocites eachSURL with a PFN

DICOM GET

Medical Data Manager, R. Texier, July 16, 2008 14

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Dicom retrieval : Anonymization and encryption

DPM

Disks pool

Standard interface

File

transfer

• Step 1A: The DPM-DICOM uses the DCMTK library to anonymize the DICOM file

• Or Step 1B: The DICOM file is converted to a 3D format (inrimage) without nominative information

• Step 2: The DPM-DICOM calls Hydra to encrypt the final file• DPM-DICOM uses the RFIO library to copy the file in a spool

disk. The spool disk is only a buffer for the file.

DPM-DICOM

Library

DICOM file

1A

1B2

Image

anonimizationDICOM server

SURL

request DPM-DICOM

Plugin

Medical Data Manager, R. Texier, July 16, 2008 15

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Service Distribution

• Hospital sites have to remain autonomous– With strong (in-site) control over the sensitive metadata

• The EGEE Data Management System federates distributed data files

• AMGA supports databases replication but not distribution– Asynchronous, master-slave model, with partial replication of

the directory hierarchy– The MDM includes a library and a query client that provide

multi-site metadata servers federation. The client is based on the AMGA client and is syntactically compatible (transparency).

- Users can send the commands to

only one or all the servers- Users can dynamically add

or remove servers

AMGA AMGA AMGA AMGA

Medical Data Manager, R. Texier, July 16, 2008 16

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Use cases

• File administration– A system administrator has access to the file for

replication / backup procedures– No access to the file content, nor to metdata

• Image processing– A neuroscientist has access to the file content for image

analysis– No access to the nominative metadata

• Medical analysis– A physician involved in the patient healthcare has

access to all data and metadata

Medical Data Manager, R. Texier, July 16, 2008 17

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

DataDataManagementManagement

The User Interface

Computing Resources

Storage Resources

Site X

Logging, real time monitoringLogging, real time monitoring

WorkloadWorkloadManagementManagement

Sites ResourcesSites Resources

InformationInformationServiceService

Dynamic evolution

DataSets info

Author.&Authen.

qu

eries

quer

ies

User requests

Resources allocation

Pu

blication

resources info

Ind

exing

Medical Data Manager, R. Texier, July 16, 2008 18

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

The User Interface

• Very few components• Easy to install• The Hydra client will be

part of the standard UI

• Standard user configuration for the LFC and BDII

• No configuration for the DPM• Only one file for :

– Hydra (services.xml)– AMGA (.mdclient)

LFC

Hydra

AMGA

Hydra

AMGA

Hydra

Multi-server AMGA client

Hydra client

ConfigurationInstallation

Medical Data Manager, R. Texier, July 16, 2008 19

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

The MDM components

Computing Resources

Site X

Logging, real time monitoringLogging, real time monitoring

WorkloadWorkloadManagementManagement

InformationInformationServiceService

Dynamic evolution

DataSets info

Author.&Authen.

qu

eries

quer

ies

User requests

Resources allocation

Pu

blication

resources info

Ind

exing

DataDataManagementManagement

Storage Resources

Sites ResourcesSites Resources

Medical Data Manager, R. Texier, July 16, 2008 20

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

MDM is on top of

• SL4 ( and CentOS ) for the DPM version of the MDM

• SL3 for the gLite-IO version of the MDM

• Libraries ( gLite, DCMTK, etc)

Medical Data Manager, R. Texier, July 16, 2008 21

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

The MDM components

AMGAdatabasefront-end

Access control

AMGA Metadata

HydraKey store

Access control

SRM v2 interfaceAccess control

Instrumented DPM

Storage Element

DPM-DICOM plugin

LFCFile Catalog

Access control

LFC

DICOM Server

BDII Server

Medical Data Manager, R. Texier, July 16, 2008 22

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

One server

• All the components could be on the same server

/vo

/dpm

/domain

/home

DPM

head nodefile

DPM disk servers

DPM-DICOM plugin

One server

BDII Server

Medical Data Manager, R. Texier, July 16, 2008 23

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Behind the components

• The LFC of the BIOMED VO is used

• The BDII must be registered by a top level BDII

• By default, AMGA uses a PostgreSQL database to store the metadata – Can use other database (Mysql, SQLite,

Oracle)

• The DPM is only a buffer : – The storage area should be small. – The file are already encrypted.– The file in the DPM can be replicated by

other servers

LFC

AMGA Metadata

SRM-DICOMInterface

Medical Data Manager, R. Texier, July 16, 2008 24

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Behind the components

• The Hydra server uses Mysql to store the keys – Each Hydra server use well-separated

tables/database

• The Hydra server is on the top of a Tomcat and an Apache server

• All the DICOM picture are stored in the DICOM server – If there is no DICOM server, the MDM

provides the CONQUEST server

Medical Data Manager, R. Texier, July 16, 2008 25

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Installation procedure

• Add some yum repositories• Install with

– Yum install MDM

Medical Data Manager, R. Texier, July 16, 2008 26

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Configuration procedure

• The server must be registered in EGEE – The server receive a certificate

• Today, there is no automatic configuration procedure• The configuration procedure is describe• Some parts of the configuration (firewall, DPM buffer,

etc) are already automatic

Medical Data Manager, R. Texier, July 16, 2008 27

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

What is New ?

• All the command are glite-*• The automatic registration of DICOM picture in AMGA is

flexible • Global RPM for the MDM and yum repositories