Confidential
Saama Technologies, Inc
Next Generation Data Management
Synthesizing & Standardizing Clinical Data in Flight
Confidential
Saama Technologies, Inc
2
• Next generation Systems &
Mechanisms for centralizing data
while maintaining integrity
• Leveraging data mining for data
quality
• Expanding skills matrix and
processes to evolve data
management
Agenda
Confidential
Saama Technologies, Inc
Next Generation
Systems
3
Confidential
Saama Technologies, Inc
4
The Problem Areas
Source Integration Standardization Visualization
Quality
Governance
Technology
Process
SDV
Meta Data
Connectors
Variability, Inconsistency, Single Points of Failure
Rules
Transform
Exceptions
Model
Reporting
Tools
Profile AuditingLineage
Confidential
Saama Technologies, Inc
5
Typical Approach
Quality
Governance
Technology
Process
SDV
Meta Data
Connectors
Rules
Transform
Exceptions
Model
Reporting
Tools
Profile AuditingLineage
Source Integration Standardization Visualization
Variability, Inconsistency, Single Points of FailureBPAAS, OCM
Data Warehouse Reporting
Manual
Manual
Extract Transform Load
Confidential
Saama Technologies, Inc
6
A Novel Approach
Quality
Governance
Technology
Process
SDV
Meta Data
Connectors
Rules
Transform
Exceptions
Model
Reporting
Tools
Profile AuditingLineage
Source Integration Standardization Visualization
Variability, Inconsistency, Single Points of FailureBPAAS, OCM
Extract & Load
to LakePonds
Discovery, Reporting,
& Analytics
Data Governance Framework
Quality Detection & Resolution
Processing Pipelines
Center of Analytic Excellence, Business Process Transformation
Confidential
Saama Technologies, Inc
Genomics
Internal, M&A,
External, Syndicated
Wearable Devices
High
Variety, Volume, &
Velocity Data
Analysis
Organization
Ingestion
Automated
Data Wrangling & Deep Data ScienceBusiness Aware
Data Analytics Services Oriented
Architecture
Harmonization
Integration
Analysis
Organized Storage
Provisioning
Aggregation
Modern
TechnologiesConfigurable
Analytic
Applications
Business
Outcomes
7
What does Next Generation Look Like?
Confidential
Saama Technologies, Inc
Data
Aggregation
Layer
HBase,
HCatalog,
Elastic
Search
Reports
Specific
Entities
Search
Indexes
Centralized &
Personalized
Reports Expo
Ad Hoc Query
Specific
Entities
Data Processing
& Analysis
Analytics via
Pig, Python,
Spark, R
Data
Standards
& Quality
Predictive
Models
Analysis &
RBMData Mining
Data
Organization
& Storage
Study
Data
Patient
Data
Industry
DataHDFS
Atlas
Falcon
Ranger | Knox
PV Data IvRS
Data Landing Zone
Data
Sources EDC SAS Safety CTMS/EDC
CRO’s, M&A SystemsInternal Systems
CRO
1… …
Other/External
Etc.CRO
nCTMS
Delivery
Of
Insights
Private Cloud,
On Premise
OR Hybrid
Deployment
Reports
Visualization
Ad-hoc
report
building
Search
InterfaceExport Utility
Business
Analytics
Served Via
Applications
Kafka, Pig,
Sqoop,
APIs, SDKs
Modern
Big Data
Environment
Data Profiling, Filters, Format Conversion, Aggregation
Data
Integration
8
The Modern Technology Stack
Confidential
Saama Technologies, Inc
Data Management – Data Pipelines
Hortonworks. Web. 18 April. 2016.
<http://hortonworks.com/hadoop-tutorial/defining-processing-data-end-end-data-pipeline-apache-falcon/>.9
Confidential
Saama Technologies, Inc
Data Management – Data Pipelines
CTMS, EDC, CRO
Ingest &
Rules
Apply
Raw DataMCC KPI
Transcelerate
KRI
Analytic
Ready
Data
Leadership
Safety
Clinical Operations
10
Confidential
Saama Technologies, Inc
Data Management – Meta Data
Papatheodorou, Irene, et al. "A metadata approach for clinical data management in
translational genomics studies in breast cancer." BMC medical genomics 2.1 (2009):
1.
11
Confidential
Saama Technologies, Inc
Data Mining & Data Quality
12
Confidential
Saama Technologies, Inc
Data Mining for Data Quality
● Cluster analysis Group data to form classes,
maximize intra-cluster similarity and minimize
similarity between clusters
● Association rules discovery Find frequent
rules in the data; popular with market basket
analysis
● Classification (e.g. decision trees) Build
(binary) tree where each node corresponds to
a split of attribute values, e.g. "if the weather is
sunny play golf else don’t play golf.¨
● Predictive modeling Build mathematical
models (functions) of the data in order to
predict unknown or missing values, or future
outcomes
● Outlier detection Find unusual, rare events
(often regarded as noise, these can be the
most interesting objects or events in the data),
used for fraud detection, network intrusion
detection, etc.
● Sequence / time series mining Find patterns
over time (e.g. episodes, clusters) Spatial
mining (geographical data analysis)
● Stream mining Where access to the data is
limited to once (e.g. network data,
telecommunications data, etc.), special
algorithms are necessary
● Multimedia mining (images, audio, video)
13
Confidential
Saama Technologies, Inc
14
Business Rule Data Quality Data Fraud
Check Leading Digit Preference
(Actual vs. Expected)
Check Leading Digit Preference
Business Rule Data Quality Data Fraud
Too few or too many outliers
identified
Inliers identified
Too little or too much
variability
Check the skewness of the
data
Business Rule Data Quality Data Fraud
Correlation between variables
Degree of interpolation for
repeated measurement
Degree of duplicated results
for repeated measurements
Business Rule Data Quality Data Fraud
Summary statistics for
variables and compare
between them
Business Rule Data Quality Data Fraud
Ordering of dates to
ensure plausibility
Data recorded at
weekend or public
holidays
Accrual that seems
implausible
Continuous
Data
Calculation
Validation
Comparison
Business Rule Data Quality Data Fraud
Potential invented
patterns in the data
Patterns of missing data
Trend
Analysis
Getting Started
Confidential
Saama Technologies, Inc
15
Outcomes of Statistical Analyses for Data Quality
• Fabricated Data
Missing or Outlying Values replaced by plausible values via
Implantation
Data Trend Invention
Distribution Observations
• Implantation
• Abnormally small variability for repeated
measurements
• Falsified Data
Enhancing Patient Eligibility or Treatment Efficacy
Distribution Observations
• Comparison between patients, measurements,
treatment interactions, center, etc.
• Unintentional Data Errors
Improperly calibrated, Imprecise Equipment
Distribution Observations
• Shift
• Large Variability
• Data Errors Resulting from Human Error
Data Entry
• Source to CTMS
Missing Data Observations
• Frequency comparisons: outlying centers (too
much or too little missing data) flagged.
Quality Fraud
Confidential
Saama Technologies, Inc
16
Outcomes of Statistical Analyses for Data
Quality
Confidential
Saama Technologies, Inc
17
Outcomes of Statistical Analyses for Data
Quality
Confidential
Saama Technologies, Inc
18
Outcomes of Statistical Analyses for Data
Quality
Confidential
Saama Technologies, Inc
19
Outcomes of Statistical Analyses for Data
Quality
Confidential
Saama Technologies, Inc
20
Outcomes of Statistical Analyses for Data
Quality
Confidential
Saama Technologies, Inc
Skills Matrix
2
1
Confidential
Saama Technologies, Inc
22
• Data Management
• Data Manager
• Database
Programmer/Designer
• Medical Coder
• Clinical Data Coordinator
• Quality Control Associate
• Data Entry Associate
• IT (Big Data)
• Architect
• Integration Developers
• Biostatistics
• Applied Statisticians
• Data Scientists
• Governance
• CoE
Stewards
• Custodians
People & Skills
Analytic
Skills
Technology
Skills
Health/Biomedical
Informatics Experts
Data Scientists
Big Data Engineers
*EMC2
Confidential
Saama Technologies, Inc
Saama
23
Confidential
Saama Technologies, Inc
24
Saama’s Life Sciences Practice
Life Sciences
Solutions
Subject Matter Experts
• Clinical Trials
• R&D | Preclinical
• GMA
• HEOR
• Commercial
Advanced Analytics
• Biostatistics
• Bioinformatics
• Data Science
• Applied Statistics
• Machine Learning
Technology
• Big Data & Connectors
• App/Web Development
• UI/UX
• MDM
• Open SourceData Standards
• CDISC
• OMOP
• Custom
• Client Plug-In
Design Process Frameworks
• Business Analysis
• Predictive Model Build
• Delivery
• QA
Rapid Configuration | Modular Design | Advanced Data Science | Agile Methodology
Confidential
Saama Technologies, Inc
HCM analytics suite
for workforce
management and
operations
Capturing new
customer segments
with insights from real
world treatment
pathways
Disease incidence and
co-morbidities insights
from population cohort
analysis
Managed Services
(Data & Analytics) /
Clinical Operations
Solution
Your InfrastructureUpdates: Modern Data Warehouse, Cloud, V4, Data Lake, Real-time
Saama’s Analytic AssetsPre-Built Clinical Data Solutions
Big Data Strategy &
AccelerationAnalytical Solutions Managed Services
Refactoring Existing
Data Systems
Analytics Advantage
25
Confidential
Saama Technologies, Inc 26Copyright © 2016, Saama Technologies | Confidential
Analytics Advantage
What We Stand For
New Class of Solution Partner: Fluid
Analytics for the digital enterprise
Simultaneous business, technology and
services acceleration with Data Science
Heritage of Innovation
Patented technologies, Fluid Analytics Engine
Accelerator.
Game-Changing Outcomes for the
Global 2000
Multi-million $ business outcomes
5000+ engagements, 99% retention rate