exascale computing and experimental sensor data

19
Exascale Computing and Experimental Sensor Data Overview given at Brookhaven National Laboratory April 18 2014 Joel Saltz Stony Brook University [email protected]

Upload: joel-saltz

Post on 05-Dec-2014

148 views

Category:

Data & Analytics


0 download

DESCRIPTION

Methods, tools and middleware for analysis of extremely large sensor datasets on high end architectures

TRANSCRIPT

Page 1: Exascale Computing and Experimental Sensor Data

Exascale Computing and Experimental Sensor Data

Overview given at Brookhaven National LaboratoryApril 18 2014

Joel Saltz Stony Brook University

[email protected]

Page 2: Exascale Computing and Experimental Sensor Data

Integrate Information from Sensors, Images, Cameras

• Multi-dimensional spatial-temporal datasets– Radiology and Microscopy Image Analyses– Oil Reservoir Simulation/Carbon Sequestration/Groundwater Pollution

Remediation– Biomass monitoring and disaster surveillance using multiple types of satellite

imagery– Weather prediction using satellite and ground sensor data– Analysis of Results from Large Scale Simulations– Square Kilometer Array– Google Self Driving Car

• Correlative and cooperative analysis of data from multiple sensor modalities and sources

• Equivalent from standpoint of data access patterns – need to develop new generation of data skeletons/mini-apps/data dwarfs

Page 3: Exascale Computing and Experimental Sensor Data

Spatio-temporal Sensor Integration, Analysis, Classification

• Multi-scale material/tissue structural, molecular, functional characterization. Design of materials with specific structural, energy storage properties, brain, regenerative medicine, cancer

• Integrative multi-scale analyses of the earth, oceans, atmosphere, cities, vegetation etc – cameras and sensors on satellites, aircraft, drones, land vehicles, stationary cameras

• Digital astronomy • Hydrocarbon exploration, exploitation, pollution remediation• Aerospace – wind tunnels, acquisition of data during flight• Solid printing integrative data analyses• Autonomous vehicles, e.g. self driving cars• Data generated by numerical simulation codes – PDEs, particle methods• Fit model with data

Page 4: Exascale Computing and Experimental Sensor Data

Typical Computational/Analysis Tasks Spatio-temporal Sensor Integration, Analysis, Classification

• Data Cleaning and Low Level Transformations• Data Subsetting, Filtering, Subsampling• Spatio-temporal Mapping and Registration• Object Segmentation • Feature Extraction• Object/Region/Feature Classification• Spatio-temporal Aggregation• Diffeomorphism type mapping methods (e.g. optimal

mass transport)• Particle filtering/prediction• Change Detection, Comparison, and Quantification

Page 5: Exascale Computing and Experimental Sensor Data

Detect and track changes in data during production

Invert data for reservoir propertiesDetect and track reservoir changes

Assimilate data & reservoir properties into the evolving reservoir model

Use simulation and optimization to guide future production

Coupled data acquisition, data analysis, modeling, prediction and correction – data assimilation, particle filtering etc.

Page 6: Exascale Computing and Experimental Sensor Data
Page 7: Exascale Computing and Experimental Sensor Data

Future State

• 100K – 1M pathology slides/hospital/year• 2GB compressed per slide• 1-10 slides used for Pathologist computer

aided diagnosis• 100-10K slides used in hospital Quality control• Groups of 100K+ slides used for clinical

research studies -- Combined with molecular, outcome data

Page 8: Exascale Computing and Experimental Sensor Data

Cent

er f

or C

ompr

ehen

sive

Inf

orma

tics

Brain Tumor Pipeline Scaling on GT/ORNL NSF Keeneland (100 Nodes)

Page 9: Exascale Computing and Experimental Sensor Data

Cent

er f

or C

ompr

ehen

sive

Inf

orma

tics

Runtime Support Objectives

• Coordinated mapping of data and computation to complex memory hierarchies

• Hierarchical work assignment with flexibility capable of dealing with data dependent computational patterns, fluctuations in computational speed associated with power management, faults

• Linked to comprehensible programming model – model targeted at abstract application class but not to application domain (In the sensor, image, camera case -- Region Templates)

• Software stack including coordinated compiler/runtime support/autotuning frameworks

Page 10: Exascale Computing and Experimental Sensor Data

HPC Segmentation and Feature Extraction Pipeline

Tony Pan, George Teodoro,Tahsin Kurc and Scott Klasky

Page 11: Exascale Computing and Experimental Sensor Data

Region Templates• Provides a generic container template for common data structures, such as

points, arrays, regions, and object sets, within a spatial and temporal bounding box

• Data region object is a storage materialization of data types and stores the data elements in the region contained by a region template instance; region template instance may have multiple data regions.

• Allows for different data I/O, storage, and management strategies and implementations, while providing a homogeneous, unified interface to the application developer.

• Application operations interact with data regions and region templates to store and retrieve data elements, rather than explicitly handling the management, staging, and distribution of the data elements.

• Current implementations on nodes with multi-core CPUs and GPUs, distributed memory storage, and high bandwidth disk I/O.

Page 12: Exascale Computing and Experimental Sensor Data

Region Template: Preliminary Experimental Evaluation

• Experimentally evaluated using pathology image analysis on the Keeneland system

• This application consists of a pipeline with Segmentation and Feature Computation Stages, and each of these stages are internally divided into finer-grained tasks for better scheduling on heterogeneous CPU-GPU equipped machines.

Page 13: Exascale Computing and Experimental Sensor Data

Cent

er f

or C

ompr

ehen

sive

Inf

orma

tics

Large Scale Data Management

Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.

Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships

Highly optimized spatial query and analyses Implemented in a variety of ways including optimized CPU/GPU, Hadoop/HDFS and IBM DB2

Supported by two NLM R01 grants – Saltz/Foran

Page 14: Exascale Computing and Experimental Sensor Data

Cent

er f

or C

ompr

ehen

sive

Inf

orma

tics

Spatial Centric – Sensor Data Feature “GIS”

Point query: human marked point inside a nucleus

.

Window query: return markups contained in a rectangle

Spatial join query: algorithm validation/comparison

Containment query: nuclear featureaggregation in tumor regions

Fusheng Wang

Page 15: Exascale Computing and Experimental Sensor Data

Cent

er f

or C

ompr

ehen

sive

Inf

orma

tics

Algorithm Validation: Intersection between Two Result Sets (Spatial Join)

PAIS: Example Queries

. .

Page 16: Exascale Computing and Experimental Sensor Data

AIS (Analytical Imaging Standards)

AIS Logical Model 62 UML classes

markups, annotations, imageReferences, provenance

AIS Data Representation XML (compressed) or HDF5

AIS Databases loading, managing and

querying and sharing data Native XML DBMS or

RDBMS + SDBMS

class Domain Mo...

Annotation

GeometricShape

CalculationObservation

Specimen

ImageReference

Provenance

User

PAIS

EquipmentGroup

AnatomicEntity

Subject

Field

Project

MicroscopyImageReference

DICOMImageReference

TMAImageReference

Markup

Inference

Region

WholeSlideImageReferencePatient

Surface

Collection

AnnotationReference

10..1

1

0..1

0..*

0..*

1

0..*1

0..11 0..*

1

0..1

10..1

10..1

10..*

1

0..*

0..*

0..*

1 0..11

0..1

1

0..*

0..1

0..*

1

0..*

1

0..1

1

0..*

10..1

10..1

1

0..*

10..*

1 0..*

1

0..*

PAIS

Page 17: Exascale Computing and Experimental Sensor Data

Cent

er f

or C

ompr

ehen

sive

Inf

orma

tics

VLDB 2012, 2013

Spatial Query, Change Detection, Comparison, and Quantification

Page 18: Exascale Computing and Experimental Sensor Data

Soft real time and streaming Sensor Data Analysis, Event Detection,

Decision Support• Integrated analyses of patient data – physiological

streams, labs, mediations, notes, Radiology, Pathology images, mobile health data feeds

• High frequency trading, arbitrage• Real time monitoring earthquakes, control of oilfields• Control of industrial plants, aircraft engines• Fusion – data capture, control, prediction of

disruptions• Internet of things• Twitter feeds• Intensive care alarms

Page 19: Exascale Computing and Experimental Sensor Data

Typical Computational Analysis Tasks Streaming Sensor Data Analysis, Event Detection, Decision

Support

• Prediction algorithms – Kalman, particle filtering• Machine learning algorithms on aggregated data

to develop model, use of model on streaming data for decision support

• Searching for rare events• Statistical algorithms to distinguish signal from

noise• On the fly integration of multiple complementary

data streams