ensuring compliance of patient data with big data and bi · 2017-08-10 · ensuring compliance of...
TRANSCRIPT
Global Sponsor:
Ensuring Compliance of Patient Data with Big Data and BI
Denny Lee, Principal Program Manager at Microsoft Ayad Shammout, Principal Business Intelligence Consultant at BIDMC
Big Data Innovations and Integration Track
Agenda
A Quick Big Data Primer • Big Data on the Microsoft Platform by Andrew Brust • What is Big Data by Mark Whitehorn
Healthcare and Big Data Compliance and Auditing
• SQL Compliance Project
Compliance and Auditing with Big Data and BI • Big Data: Unstructured Volumes of Data • BI: PowerPivot, Power View
What is Big Data?
Volume Exceeds physical limits of vertical scalability
Velocity Decision window small compared to data change rate
Variety Many different formats makes integration expensive
Variability Many options or variable interpretations confound analysis
4
10x increase every five years
85% from new data types
Volume Velocity Variety
Hadoop
Cloud
By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent.
– Gartner, Mark Beyer “Information Management in the
21st Century”
Healthcare and Big Data
• Often a laggard in technology • Yet, application of technology will be revolutionary to
understanding the human system • Genomic sequencing brings the promise of understanding
human biological systems • Proteomic sequencing brings the promise of building the
protein sequences to build customized drugs • Healthcare Incidence Prediction: Heart Attacks and Asthma
Healthcare Big Data Example Scenarios
• Clinical trials: not just examining existing drugs and efficacy, but also potential deviations • E.g. Originally Viagra was developed to lower blood pressure
and treat Angina; now it also helps with newborn pulmonary hypertension and altitude sickness
• Predicting healthcare incidences issues • Social media campaigns (e.g. advertising drugs) • Pharmaceutical campaign advertising analytics
• Modeling the consumer, trying to understand their user behavior (why are they purchasing this medication, how do they feel about their ailment, related behaviors, etc.)
• Patient Satisfaction Survey 12
Compliance
SQL Server Compliance: http://www.microsoft.com/sqlserver/2008/en/us/compliance.aspx Reaching Compliance Whitepaper: http://www.microsoft.com/en-us/download/details.aspx?id=6808
13
IT Control SOX
PCI
HIPA
A
GLB
A
ID Management
Separation of Duties
Encryption
Key Management
Auditing
Control Testing
Policy Management
Auditing: BIDMC Scenario
Auditing is critical component HIPAA compliance and ensuring patient privacy 1 Billion rows+ of audit data 146 mission critical clinical applications Comprehensive audits yield 300-500k transactions/day HIPAA requires audit system with 20 years of data
Auditing Project Available to community as part of Compliance SDK Collaboration of Caregroup, MCS, SQLCAT
Quote: Creating an enterprise tool for consolidated storage, reporting and alerting
of all application audit data - that's cool! John Halamka’s Cool Technology of the Week (Wellsphere Top Health
Blogger, Health Impact Award)
14
Compliance: Auditing
Audit specific users Typically want to do sysadmin But, many scenarios require auditing of more users because those users
have insert, update access Based on your policies
Audit specific tables Audit all tables that can only be modified or deemed as sensitive
Audit Objects Key and encryption access auditing (Audit action types:
DATABASE_OBJECT_ACCESS_GROUP and DATABASE_OBJECT_CHANGE_GROUP)
Audit everything approach Can grow quite quickly (i.e. lots of data) so may want to limit data Or have your audit reporting system filter out data you do not need
15
BIDMC Compliance Project
16
SSIS SSIS
SSIS
HDInsight Windows
HDInsight Azure
SQL Server 2008/2012 Audit Logs ETL Logs to HDFS
SSAS (tabular)
Use Excel 2013 PowerPivot and Power View.
Centralizing Audit Logs and Reporting
Centralizing Logs Allows you to have one system process all audit logs from your
servers Easier manageability Set files to 250MB in size (less files, but not too large to process) Optimized for Hadoop General Rule of Thumb: 250MB-1GB file sizes
Can also centralize processing … and centralize reporting
Compliance SDK contains the full project Organized by Server, Database, DDL, and DML actions
17
Auditing: Interesting Observations
Backup a user database: Need CREATE permissions on the master database to look at the
backup media The CREATE permission is a misnomer since you are not creating Nevertheless required to do a backup hence the RESTORE
LABELONLY statements in your audit
Server Principal Name is the user name A lot of VIEW SERVER STATE calls but is part of important server audit specification (may want to filter this out) Audit Logs can generate A LOT of data
• 2 medium servers generated 250GB of files in 6 hours!
18
Auditing Sensitive Information
19
Querying Audit InformationUse PowerPivot / Power View / Analysis Services to Query the data.
Security InformationPolicy Information
Process Audit InformationUse SSIS to process SQL2008 All-Actions Audit Information and other CG application audit log data; potentially can use Management Performance DW framework.
Caregroup Environment
File Server
SQL Audit
Connect/Logic
SSIS
CG Application Data
Intersystems Cache
SQL2005
Oracle
SQL2008 All-Actions Audit Data
SQL 2008 / 2012 R2
SSRS 2008 /Power View
Policy Analysis
Policy Reports
Policy Best Practices
Security Analysis
Security Reports
Compliance Reports
Feedback Action LoopUpdate systems to keep them
compliant and secure
Storage Infrastructure
21
Hadoop on Azure Compute Nodes (Medium VMs)
Azure Storage Vault (ASV) Azure Blob Storage
Azure Flat Network Storage
Hadoop / Auditing: File sizes
Currently testing gz vs. raw, • E.g. 12MB raw text file vs. 633Kb gz file (~20x compression)
20x smaller size, ~same query time
• Approx same map / reduce task utilization
File Size is 250MB-1GB • SSIS package takes care of the size
Future testing: avro, protobuf
23
Query Duration (s)
select count(*) from sql_audit_asv_raw 56.066
select count(*) from sql_audit_asv_gz 58.994
Hadoop / Auditing: Formats
For ease of processing, replace carriage returns within embedded SQL statements, e.g. select col1, col2
from tableA
to select col1, col2 from tableA
This allows you to create a Hive table using CR as row delimiter (i.e. does not have things like SQL quoted identifiers)
24