data day health it - data architecture

Post on 14-Feb-2017

21 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Healthcare Considerations for Modern Data Architectures Pitfalls, Challenges and Best Practices Data Day Health 2017

Presented by:Toby Owen, VP Product Development

OnRamp - Industry leading high security and hybrid hosting

provider- Operates multiple enterprise class data centers

located in Austin, Texas and Raleigh, North Carolina- SSAE 16 SOC II and SOC 3 Audited, PCI and HIPAA

compliant company- Specializes in helping organizations meet their

rigorous compliance requirement and keep their data safe

Toby Owen- Vice President, Product Development, OnRamp- 20 year IT veteran with operations and

engineering background- Security, IT ops at scale, hybrid cloud,

compliant workload hosting

AGENDAGOAL: Designing an app for Healthcare… that’s compliant!

Data StoresApp DesignWhere to Run ItDev LifecycleTakeawaysQ & A

Refresher on (or intro to) databasesCAP theorem

C = ConsistencyA = AvailabilityP = Partition Tolerance

Database Reference Guide – at a glance

*Adapted from http://blog.nahurst.com/visual-guide-to-nosql-systems

Why do we care?• Scaling vertically versus horizontally

- Costs of scaling up can grow exponentially - Scaling horizontally is linear- Limits to scaling vertically, “indefinite”

horizontal scale limit• Data sources are increasingly distributed• Horizontal scaling provides better geo-

resiliency at the same time• Not all data needs strict ACID compliance More arguments favor distributed data stores

RDBMS and ACID• Definition: Atomicity, Consistency, Isolation, Durability• Favors Consistency over Availability• Examples- MSSQL- MySQL- Postgres- Greenplum- VoltDB

Is scalability and ACID a false tradeoff?• Scalability and ACID are difficult to satisfy at the same

time• Not all data requires strict ACID compliance• Relational can be a bottleneck- Simpler models might simplify operations – easier and more

efficient• New relational DBs can be very fast AND scalable• Many NoSQL DB’s adding features to look more like

RDBMS• Take-away: understand your data (shape and use case)

and pick the right solution

NoSQL and BASE• NoSQL Definition- SOME of the following: non-relational, distributed, open-source,

horizontally scalable, schema free, easy replication support, simple API• BASE Definition: Basically Available, Soft state, Eventual

consistency- All data reads will eventually yield the same result

• Favors Availability over Consistency• Let’s focus some time here exploring NoSQL

databases/datastores- Considerations based on scalability, encryption and key management

• Document oriented Database (JSON). Considered “semi-structured” data• Scalability - built in via automatic sharding (range, hash, zone)

- EA FIFA game (250+ servers), Yandex (10’s billion objects, TBs of data, growing at 10MM files uploads/day)• Security – encryption in-transit

- SSL/TLS client support (data in-transit)- MongoDB Enterprise Advanced supports FIPS 140-2- Atlas (Mongo-aaS on Amazon) does NOT support FIPS mode

• Security – encryption at-rest- App level, external filesystem, disk level, or natively (encrypted storage engine). Native suports FIOPS

140-2• Security – key management

- Each DB has a separate Key- Can be integrated with external KMS- Supports key rotation without downtime (via rolling restarts of replica set)- Native encryption is only available via Enterprise Advanced version!

• Row-oriented• Scalability – peer-to-peer distributed system, data across all nodes

- Each node contains commit log, exchanges data across cluster every second- All writes are automatically partitioned and replicated throughout cluster- Apple (75,000 nodes, 10PB); Netflix (2,500 nodes, 420TB, 1 trillion requests/day)

• Security – encryption in-transit- Supports TLS/SSL, separate configs for client-server and server-server- FIPS compliance supported

• Security – encryption at-rest- Open-source Cassandra relies on filesystem encryption- Datastax (commercial version) supports at-rest encryption

• Security – key management- Open-source Cassandra relies on filesystem encryption’s key management tools (can be complex)- Datastax (commercial version) has native KMIP support

• Not really a database – distributed filesystem (HDFS) plus application interface (MapReduce)• Scalability – designed for large file distribution across 100’s and 1000’s of servers, streaming

access and large data sets - (compute cheaper to move than data)- Facebook (21PB, 2000 machines), Spotify (1300 nodes, 42PB storage, 20TB a day ingested, 200TB a

day generated by Hadoop)• Security – encryption in-transit

- HDFS supports transparent encryption • Security – encryption at-rest

- Supported by HDFS, application, database, or disk-level- Lots of options for commercial support and tools to simplify management

• Security – key management- Natively supports it’s own KMS- Again, more commercial options exist to simplify

LOTS of others• Key Value

- Redis - DynamoDB

• Document Oriented- CouchDB - DocumentDB

• Time Series• Graph• + 225 more! (nosql-database.org for basic info and

comparisons)

So you’ve chosen your datastore(s)Now what?

Application architecture!

Application design SOME Considerations for HIPAA and HITECH• HITECH – each app zone requires firewall isolation- Web, app, database

• Key Management- Key Management System (KMS)- Hardware Security Module (HSM)- Keys database- Key splitting – for transferring clear-text cipher keys

Reference Architecture

And more• Many other security considerations around compliant

application architecture- Shared storage resources and shared IaaS

Supporting encryption at-rest may not be enough to achieve HIPAA or HITRUST compliance.

- Verifiable (compliant) destruction of data in a shared environment - Encryption keys need to be managed in accordance with

shared secrets or ‘key splitting’ schemes (e.g. Shamir’s secret sharing)

Next?We’ve chosen the right datastores…We’ve designed our application to support HITRUST or HIPAA…

Where will the app run?

Hybrid is the likely reality• Consuming 3rd party data

sources• Capabilities of each data or

app component provider• BAA with each provider• Peril of failing to plan

How to keep all this compliant?• Lots to consider to get it right• Start at the beginning – your

development lifecycle• Automate everything• Dev/Test/Staging/Production should all

account for secure design• Use Containers ?• Maybe get some help

Key Takeaways• Distributed data is becoming the new norm• Data is different – data usage should dictate data technology

- (no one-size-fits-all)• Application Architecture is key to achieving compliance• Must consider all locations where app is running• Consider compliance in all phases of app development (starting

with design)• Automation in development pipeline is key to building-in and

maintaining compliance throughout app lifecycle• Final consideration – are you now a service provider?

Toby OwenVP, Product DevelopmentOnRamptowen@onr.com@tobydowenlinkedin.com/in/tobyowen

Resources• Databases and scaling:

- http://stackoverflow.com/questions/12215002/why-are-relational-databases-having-scalability-issues- http://blog.nahurst.com/visual-guide-to-nosql-systems- http://nosql-database.org/

• MongoDB- https://www.mongodb.com/mongodb-architecture- https://webassets.mongodb.com/_com_assets/collateral/MongoDB_Security_Architecture_WP.pdf

• Cassandra- http://cassandra.apache.org/doc/latest/operating/security.html?highlight=encryption- http://stackoverflow.com/questions/32584253/how-to-use-cassandra-with-tde-transparent-data-encryption- http://dba.stackexchange.com/questions/6909/cassandra-encryption-at-rest- http://www.datastax.com/products/datastax-enterprise

• Hadoop- https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html- Hadoop at Scale: Spotify http

://cdn.oreillystatic.com/en/assets/1/event/118/The%20Evolution%20of%20Hadoop%20at%20Spotify-%20Through%20Failures%20and%20Pain%20Presentation.pdf

• Key management- https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing

top related