the virtual microscope umit v. catalyurek department of biomedical informatics division of data...

Post on 19-Dec-2015

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Virtual Microscope

Umit V. CatalyurekDepartment of Biomedical Informatics

Division of Data Intensive and Grid Computing

The Virtual Microscope Joel Saltz Renato Ferreira Michael Beynon Chialin Chang Alan Sussman Tahsin Kurc Robert Miller Angelo Demarzo Mark Silberman Asmara Afework Anthony Wiegering

Virtual Microscope (VM) Interactive software emulation of high power

light microscope for processing image datasets visualize and explore microscopy images screen for cancer categorize images for associative retrieval electronic capture of slide examination process used

in resident training collaborative diagnosis

Virtual Microscope (Hopkins/UMD), Distributed Telemicroscopy System (Rutgers), [Gu] Virtual Telemicroscope, Virtual Microscopy (UPMC), Baccus Virtual Microscope

The Virtual Microscope

Data requirement Full cases consisting of multiple digitized

glass slides with data acquired at 400X Single spot 1000x1000 pixels, 3-byte

RGB=3MB A slide of 2.5cmx3.5cm requires 50x70 grid

= 10GB uncompressed Each slide can have multiple focal planes Johns Hopkins alone generates 500,000

slides per year

The Virtual Microscope Client-server architecture Java 1.2 Client

Portability Data storage & Image compression

More efficient storage, reduced transmission time 2 server implementations:

Customized instance of Active Data Repository Improved scalability, portability, user-defined processing

Component-based implementation using DataCutter Heterogeneous systems, portability, user-defined

processing Caching in the VM Client

Improved response time Experimental Results

VM Client

VM Client

Image Declustering

0

1 2

3 4 5

67

0

01

2 3 4

5

6

7

1

234

56

7

0 1

23

4

4

5 6

7 0

1 2

3

5

6

7 0 1

23

4

5 6

7 0

1 2

3

45

6 7

0

12

345

6 7

Image Compression JPEG compression - storage and network data reduction by a factor of 10 still may take long time to transmit

images For example, 640x480 image

920 KB uncompressed ~ 90 KB jpeg compressed ~ 13 seconds to transfer using 56 Kb

modem

Active Data Repository (ADR) A C++ class library and runtime system

for building parallel databases of multi-dimensional datasets

enables integration of storage, retrieval and processing of multiple datasets on parallel machines and clusters.

provides support for common operations such as data retrieval, memory management, scheduling of processing across a parallel machine.

can be customized for various applications. Front-end: the interface between clients and

back-end. Back-end: data storage, retrieval, and

processing. Distributed memory parallel machine or cluster,

with multiple disks attached to each node Customizable services for application-specific

processing

Query InterfaceService

Query SubmissionService

Front-end

Virtual Microscope Front-end

Dataset Service

Attribute SpaceService

Data AggregationService

IndexingService

Query ExecutionService

Query PlanningService

Back-end

Client

Client Client Client

. . .

Query:* Slide number* Focal plane* Magnification* Region of interest

Image blocks

Virtual Microscope with ADR

DataCutterA suite of Middleware for subsetting and filtering multi-

dimensional datasets stored in a distributed environment Indexing Service

Multilevel hierarchical indexes based on spatial indexing methods – e.g., R-trees

Filtering Service Distributed C++ component framework Specialized components for processing data filters – logical unit of computation, high level tasks,

init,process,finalize interface streams – how filters communicate

unidirectional buffer pipes uses fixed size buffers (min, good)

manually specify filter connectivity and filter-level characteristics

Virtual Microscope with DataCutter

zoom viewread_data decompress clip

clip-zoom-viewread_data decompress

decompress-clip-zoom-viewread_data

DC-5F

DC-3F

DC-2F

Caching in the Client Reduce data re-transmission

Cache part of the retrieved data in the client Cache multiple resolutions/magnifications

Cache only what the user views

Two-level cache client memory is the first level cache local disk on the client machine is the

second level

Caching Multiresolution Images

VM Server Performance

ADR VM Server Performance

VM ADR Server under workload

Average Response Time for 1024x1024 Output

0.0

5.0

10.0

15.0

20.0

25.0

30.0

1 2 4 8

Number of Processors

Res

po

nse

Tim

e (s

eco

nd

s)

ADR

ADR-1bg

ADR-4bg

ADR-16bg

Average Response Time for 512x512 Output

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

1 2 4 8

Number of Processors

Res

po

nse

Tim

e (s

eco

nd

s)

ADR

ADR-1bg

ADR-4bg

ADR-16bg

VM Servers: ADR vs DC

Average Response Time for 1024x1024 Output

0.0

1.0

2.0

3.0

4.0

5.0

6.0

1 2 4 8

Number of Processors

Res

po

nse

Tim

e (s

eco

nd

s)

ADR

DC-5F

DC-2F

Average Response Time for 512x512 Output

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1 2 4 8

Number of Processors

Res

po

nse

Tim

e (s

eco

nd

s)

ADR

DC-5F

DC-2F

VM Servers: ADR vs DCAverage Response Time for 512x512 Output

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1 2 4 8

Number of Processors

Res

po

nse

Tim

e (s

eco

nd

s)

ADR

DC-5F

DC-2F

Average Response Time for 512x512 Output

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

2 4 8

Number of Processors

Res

po

nse

Tim

e (s

eco

nd

s)

ADR-1bg

DC-5F-1bg

DC-2F-1bg

Average Response Time for 512x512 Output

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

2 4 8

Number of Processors

Resp

on

se T

ime (

seco

nd

s)

ADR-4bg

DC-5F-4bg

DC-2F-4bg

Average Response Time for 512x512 Output

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

2 4 8

Number of Processors

Res

po

nse

Tim

e (s

eco

nd

s)

ADR-16bg

DC-5F-16bg

DC-2F-16bg

VM: ADR vs DC on SMP

Average Response Time for 512x512 Output

0.00

0.50

1.00

1.50

2.00

2.50

1 2 4 8

Number of Clients

Res

po

nse

Tim

e (s

eco

nd

s)

ADR

8x(R-DCZV)

4x(2xR-2xDCZV)

2x(4xR-4xDCZV)

4x(2xR-4xD-2xCZV)

Average Response Time for 1024x1024 Output

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

1 2 4 8

Number of Clients

Res

po

nse

Tim

e (s

eco

nd

s)

ADR

8x(R-DCZV)

4x(2xR-2xDCZV)

2x(4xR-4xDCZV)

4x(2xR-4xD-2xCZV)

Caching Client Performance

Caching Client Performance

Summary 2 VM servers:

Homogeneous systems tightly coupled parallel machines with attached local disks

Heterogeneous systems, grid Java 1.2 Client

Multiresolution image caching

Try http://vmscope.jhmi.edu

End of Talk

0 3 4 5 2 3 4 7 0 1 6 7 0 3 4 51 2 7 6 1 0 5 6 3 2 5 4 1 2 7 66 5 0 1 6 7 2 1 4 7 0 3 6 5 0 17 4 3 2 5 4 3 0 5 6 1 2 7 4 3 20 1 6 7 0 1 6 7 2 1 6 5 0 3 4 53 2 5 4 3 2 5 4 3 0 7 4 1 2 7 64 7 0 3 4 7 0 3 4 5 2 3 6 5 0 15 6 1 2 5 6 1 2 7 6 1 0 7 4 3 22 1 6 5 2 1 6 5 0 1 6 7 0 3 4 53 0 7 4 3 0 7 4 3 2 5 4 1 2 7 64 5 2 3 4 5 2 3 4 7 0 3 6 5 0 17 6 1 0 7 6 1 0 5 6 1 2 7 4 3 20 3 4 5 2 3 4 7 2 1 6 5 0 3 4 51 2 7 6 1 0 5 6 3 0 7 4 1 2 7 66 5 0 1 6 7 2 1 4 5 2 3 6 5 0 17 4 3 2 5 4 3 0 7 6 1 0 7 4 3 2

top related