apr 21, 2015 - hasso-plattner-institut · 2015-04-22 · 2015 our approach analyze genomes:...
Post on 04-Jun-2020
0 Views
Preview:
TRANSCRIPT
In-Memory Databases: Applications in Healthcare
Dr. Matthieu-P. Schapranow
Apr 21, 2015
Frankfurter Allgemeine Zeitung Verlagsspezial
Medizin zwischen Möglichkeiten und Erfolg
17. April 2015
■ Online: Visit we.analyzegenomes.com for latest research results, tools, and news
■ Offline: Read more about it, e.g. High-Performance In-Memory Genome Data Analysis: How In-Memory Database Technology Accelerates Personalized Medicine, In-Memory Data Management Research, Springer, ISBN: 978-3-319-03034-0, 2014
■ In Person: Join us for “German Biotechnology Days” Apr 22-23, 2015 in Cologne, Germany
Important things first: Where do you find additional information?
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
3
IT Challenges Distributed Heterogeneous Data Sources
4
Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes
Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB)
Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov
Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB
PubMed database >23M articles
Hospital information systems Often more than 50GB
Medical sensor data Scan of a single organ in 1s creates 10GB of raw data Cancer patient records
>160k records at NCT In-Memory Databases: Applications in Healthcare
Dr. Schapranow, Apr 21, 2015
Dr. Schapranow, Apr 21, 2015
Our Approach Analyze Genomes: Real-time Analysis of Big Medical Data
5
Drug Response Analysis
Pathway Topology Analysis
Medical Knowledge
Cockpit
Oncolyzer Clinical Trial Assessment
Cohort Analysis
In-Memory Database
Extensions for Life Sciences
Data Exchange, App Store
Access Control, Data Protection
Fair Use
Statistical Tools
Combined and Linked Data
Genome Data
Cellular Pathways
Genome Metadata
Research Publications
Pipeline and Analysis Models
Real-time Analysis
App-spanning User Profiles
Drugs and Interactions
...
In-Memory Databases: Applications in Healthcare
■ Patients
□ Individual anamnesis, family history, and background
□ Require fast access to individualized therapy
■ Clinicians
□ Identify root and extent of disease using laboratory tests
□ Evaluate therapy alternatives, adapt existing therapy
■ Researchers
□ Conduct laboratory work, e.g. analyze patient samples
□ Create new research findings and come-up with treatment alternatives
The Setting Actors in Oncology
Dr. Schapranow, Apr 21, 2015
6
In-Memory Databases: Applications in Healthcare
■ Motivation: Can we enable clinicians to take their therapy decisions:
□ Incorporating all available specifics about each individual patient,
□ Referencing latest lab results and worldwide medical knowledge, and
□ Interactively during their ward round?
Our Motivation Make Precision Medicine Come Routine in Real Life
In-Memory Databases: Applications in Healthcare
7
Dr. Schapranow, Apr 21, 2015
Cloud-based Services for Processing of DNA Data
■ Control center for processing of raw DNA data, such as FASTQ, SAM, and VCF
■ Personal user profile guarantees privacy of uploaded and processed data
■ Supports reproducible research process by storing all relevant process parameters
■ Implements prioritized data processing and fair use, e.g. per department or per institute
■ Supports additional service, such as data annotations, billing, and sharing for all Analyze Genomes services
■ Honored by the 2014 European Life Science Award
In-Memory Databases: Applications in Healthcare
Standardized Modeling and runtime environment for
analysis pipelines
8
Dr. Schapranow, Apr 21, 2015
Real-time Processing of Event Data from Medical Sensors
■ Processing of sensor data, e.g. from Intensive Care Units (ICUs) or wearable sensor devices (quantify self)
■ Multi-modal real-time analysis to detect indicators for severe events, such as heart attacks or strokes
■ Incorporates machine-learning algorithms to detect severe events and to inform clinical personnel in time
■ Successfully tested with 100 Hz event rate, i.e. sufficient for ICU use
In-Memory Databases: Applications in Healthcare
Comparison of waveform data with history of similar patients
9
Dr. Schapranow, Apr 21, 2015
t
Drug Safety Statistical Analysis of Drug Side Effects Data
■ Combines confirmed side effect data from different data sources
■ Interactive statistical analysis, e.g. apriori rules, to discover still unknown interactions
■ Integrates personal prescription data and directly report side effects
■ Work together with your doctor to prevent interaction with already prescribed drugs
In-Memory Databases: Applications in Healthcare
10
Dr. Schapranow, Apr 21, 2015
Unified access to international side effect data
On-the-fly extension of database schema to add
side effect databases
+++
Real-time Assessment of Clinical Trial Candidates
■ Switch from trial-centric to patient-centric clinical trials
■ Real-time matching and clustering of patients and clinical trial inclusion/exclusion criteria
■ No manual pre-screening of patients for months: In-memory technology enables interactive pre-screening process
■ Reassessment of already screened or already participating patient reduces recruitment costs
In-Memory Databases: Applications in Healthcare
Assessment of patients preconditions for clinical trials
11
Dr. Schapranow, Apr 21, 2015
Enable Interavtive Data Exploration
And Analysis
Drug Response Analysis Data Sources and Matching
In-Memory Databases: Applications in Healthcare
Patient-specific Data
Tumor-specific Data
Experiment Data
Metadata e.g. smoking status,
tumor classification and age
Genome Data e.g. raw DNA data
and genetic variants
Experiment Results e.g. medication effectivity
obtained from wet laboratory
12
Dr. Schapranow, Apr 21, 2015
Drug Response Analysis Interactive Data Exploration
■ Drug response depends on individual genetic variants of tumors
■ Challenge: Identification of relevant genetic variants and their impact on drug response is a ongoing research activity, e.g. Xenograft models
■ Exploration of experiment results is time-consuming and Excel-driven
■ In-memory technology enables interactive exploration of experiment data to leverage new scientific insights
In-Memory Databases: Applications in Healthcare
Interactive analysis of correlations between drugs
and genetic variants
13
Dr. Schapranow, Apr 21, 2015
Medical Knowledge Cockpit for Patients and Clinicians
■ Search for affected genes in distributed and heterogeneous data sources
■ Immediate exploration of relevant information, such as
□ Gene descriptions,
□ Molecular impact and related pathways,
□ Scientific publications, and
□ Suitable clinical trials.
■ No manual searching for hours or days: In-memory technology translates searching into interactive finding!
In-Memory Databases: Applications in Healthcare
Automatic clinical trial matching build on text
analysis features
Unified access to structured and un-structured data
sources
14
Dr. Schapranow, Apr 21, 2015
■ In-place preview of relevant data, such as publications and publication meta data
■ Incorporating individual filter settings, e.g. additional search terms
Medical Knowledge Cockpit for Patients and Clinicians Publications
In-Memory Databases: Applications in Healthcare
15
Dr. Schapranow, Apr 21, 2015
Dr. Schapranow, Apr 21, 2015 ■ Personalized clinical trials, e.g. by incorporating patient specifics
■ Classification of internal/external trials based on treating institute
Medical Knowledge Cockpit for Patients and Clinicians Latest Clinical Trials
In-Memory Databases: Applications in Healthcare
16
Dr. Schapranow, Apr 21, 2015
Medical Knowledge Cockpit for Patients and Clinicians Pathway Topology Analysis
■ Search in pathways is limited to “is a certain element contained” today
■ Integrated >1,5k pathways from international sources, e.g. KEGG, HumanCyc, and WikiPathways, into HANA
■ Implemented graph-based topology exploration and ranking based on patient specifics
■ Enables interactive identification of possible dysfunctions affecting the course of a therapy before its start
In-Memory Databases: Applications in Healthcare
Unified access to multiple formerly disjoint data sources
Pathway analysis of genetic variants with graph engine
17
Medical Knowledge Cockpit for Patients and Clinicians Search in Structured and Unstructured Medical Data
■ Extended text analysis feature by medical terminology
□ Genes (122,975 + 186,771 synonyms)
□ Medical terms and categories (98,886 diseases, 47 categories)
□ Pharmaceutical ingredients (7,099)
■ Indexed clinicaltrials.gov database (145k trials/30,138 recruiting)
■ Extracted, e.g., 320k genes, 161k ingredients, 30k periods
■ Select studies based on multiple filters in less than 500ms
In-Memory Databases: Applications in Healthcare
Clinical trial matching using text analysis features
Unified access to structured and unstructured data sources
T18
Dr. Schapranow, Apr 21, 2015
■ Google-like user interface for searching data
■ Seamless integration of individual EMR data
■ Search various sources for biomarkers, literature, and diseases
Medical Knowledge Cockpit for Patients and Clinicians Seamless Integration of Patient Specifics
In-Memory Databases: Applications in Healthcare
19
Dr. Schapranow, Apr 21, 2015
Our Methodology Design Thinking Methodology
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
20
Our Methodology Design Thinking Methodology
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
21
Desirability ■ Leveraging directed customer services
■ Portfolio of integrated services for clinicians, researchers, and patients
■ Include latest research results, e.g. most effective therapies Viability
■ Enable personalized medicine also in far-off regions and developing countries
■ Share data via the Internet to get feedback from word-wide experts (cost-saving)
■ Combine research data (publications, annotations, genome data) from international databases in a single knowledge base
Feasibility ■ HiSeq 2500 enables high-coverage
whole genome sequencing in 20h
■ IMDB enables allele frequency determination of 12B records within <1s
■ Detection of 1 relevant annotation out of 80M <1s
■ Cloud-based data processing services reduce TCO
Combined column and row store
Map/Reduce Single and multi-tenancy
Lightweight compression
Insert only for time travel
Real-time replication
Working on integers
SQL interface on columns and rows
Active/passive data store
Minimal projections
Group key Reduction of software layers
Dynamic multi-threading
Bulk load of data
Object-relational mapping
Text retrieval and extraction engine
No aggregate tables
Data partitioning Any attribute as index
No disk
On-the-fly extensibility
Analytics on historical data
Multi-core/ parallelization
Our Technology In-Memory Database Technology
+
+++
+
P
v
+++t
SQL
xx
T
disk
22
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
In-Memory Data Management Overview
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
23
Advances in Hardware
64 bit address space – 4 TB in current server boards
4 MB/ms/core data throughput
Cost-performance ratio rapidly declining
Multi-core architecture (6 x 12 core CPU per blade)
Parallel scaling across blades
1 blade ≈50k USD = 1 enterprise class server
Advances in Software
Row and Column Store Compression Partitioning Insert Only
A
Parallelization
+++
++
P
Active & Passive Data Stores
In-Memory Database Technology Use Case: Analysis of Genomic Data
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
24
Analysis of Genomic
Data
Alignment and Variant Calling
Analysis of Annotations in World-wide DBs
Bound To CPU Performance Memory Capacity
Duration Hours – Days Weeks
HPI Minutes Real-time
In-Memory Technology
Multi-Core
Partitioning & Compression
□ 1,000 core cluster at Hasso Plattner Institute with 25 TB main memory
□ Consists of 25 nodes, each:
– 40 cores
– 1 TB main memory
– Intel® Xeon® E7- 4870
– 2.40GHz
– 30 MB Cache
In-Memory Database Technology Hardware Characteristics at HPI FSOC Lab
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
25
■ Row stores are designed for operative workload, e.g.
□ Create and maintain meta data for tests
□ Access a complete record of a trial or test series
■ Column stores are designed for analytical work, e.g.
□ Evaluate the number of positive test results
□ Identification of correlations or test candidates
■ In-Memory approach: Combination of both stores
□ Increased performance for analytical work
□ Operative performance remains interactively
Combined Column and Row Store
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
26
+
■ Traditional databases allow four data operations:
□ INSERT, SELECT and
□ DELETE, UPDATE
■ Insert-only requires only INSERT, SELECT to maintain a complete history (bookkeeping systems)
■ Insert-only enables time travelling, e.g. to
□ Trace changes and reconstruct decisions
□ Document complete history of changes, therapies, etc.
□ Enable statistical observations
Insert-Only / Append-Only
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
27
+++
+
■ Main memory access is the new bottleneck
■ Lightweight compression can reduce this bottleneck, i.e.
□ Lossless
□ Improved usage of data bus capacity
□ Work directly on compressed data
Lightweight Compression
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
28
Attribute Vector
RecId ValueId 1 C18.0 2 C32.0 3 C00.9 4 C18.0 5 C20.0 6 C20.0 7 C50.9 8 C18.0
Data Dictionary
Inverted Index
… … … C18.0 Colon 646470 C50.9 Mama 167898 C20.0 Rectum 647912 C20.0 Rectum 215678 C18.0 Colon 998711 C00.9 Lip 123489 C32.0 Larynx 357982 C18.0 Colon 091487
RecId 1 RecId 2 RecId 3 RecId 4 RecId 5 RecId 6 RecId 7 RecId 8 …
ValueId RecIdList 1 2 2 3 3 5,6 4 1,4,8 5 7
ValueId Value 1 Larynx 2 Lip 3 Rectum 4 Colon 5 Mama Table
• Typical compression factor of 10:1 for enterprise software
• In financial applications up to 50:1
■ Horizontal Partitioning
□ Cut long tables into shorter segments
□ E.g. to group samples with same relevance
■ Vertical Partitioning
□ Split off columns to individual resources
□ E.g. to separate personalized data from experiment data
■ Partitioning is the basis for
□ Parallel execution of database queries
□ Implementation of data aging and data retention management
Partitioning
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
29
■ Modern server systems consist of x CPUs, e.g.
■ Each CPU consists of y CPU cores, e.g. 12
■ Consider each of the x*y CPU core as individual workers, e.g. 6x12
■ Each worker can perform one task at the same time in parallel
■ Full table scan of database table w/ 1M entries results in 1/x*1/y search time when traversing in parallel
□ Reduced response time
□ No need for pre-aggregated totals and redundant data
□ Improved usage of hardware
□ Instant analysis of data
Multi-core and Parallelization
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
30
■ Consider two categories of data stores: active and passive
■ Active data are accessed frequently & updates are expected, e.g.
□ Most recent experiment results, e.g. last two weeks
□ Samples that have not been processed, yet
■ Passive data are used for analytical & statistical purposes, e.g.
□ Samples that were processed 5 years ago
□ Meta data about seeds that are not longer produced
■ Moving passive data on slower storages
□ Reduces main memory demands
□ Improves performance for active data
Active and Passive Data Store
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
31
P
■ Layers are introduced to abstract from complexity
■ Each layer offers complete functionality, e.g. meta data of samples
■ Less layer result in
□ Less code to maintain
□ More specific code
□ Reduced resource demands
□ Improves performance of applications due to eliminating obsolete processing steps
Reduction of Application Layers
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
32 xx
■ For patients
□ Identify relevant clinical trials and medical experts
□ Start most appropriate therapy as early as possible
■ For clinicians
□ Preventive diagnostics to identify risk patients early
□ Indicate pharmacokinetic correlations
□ Scan for similar patient cases, e.g. to evaluate therapy
■ For researchers
□ Enable real-time analysis of medical data and its assessment, e.g. assess pathways to identify impact of detected variants
□ Combined free-text search in publications, diagnosis, and EMR data, i.e. structured and unstructured data
What to take home? Test-drive it yourself: http://we.AnalyzeGenomes.com
Dr. Schapranow, Apr 21, 2015
33
In-Memory Databases: Applications in Healthcare
Keep in contact with us!
Hasso Plattner Institute Enterprise Platform & Integration Concepts (EPIC)
Program Manager E-Health Dr. Matthieu-P. Schapranow
August-Bebel-Str. 88 14482 Potsdam, Germany
Dr. Matthieu-P. Schapranow schapranow@hpi.de http://we.analyzegenomes.com/
Dr. Schapranow, Apr 21, 2015
In-Memory Databases: Applications in Healthcare
34
top related