mongodb in denver: how global healthcare exchange is using mongodb

31
GHX proprietary information: Please do not copy or distribute GHX Use of MongoDB Bill Perkins: Director of Data and Architecture Jeff Sherard : Manager of Database Services 1

Upload: mongodb

Post on 15-Jul-2015

215 views

Category:

Data & Analytics


2 download

TRANSCRIPT

GHX proprietary information: Please do not copy or distribute

GHX Use of MongoDBBill Perkins: Director of Data and Architecture

Jeff Sherard: Manager of Database Services

1

GHX proprietary information: Please do not copy or distribute

So there, what is a GHX?

2

What is a GHX?

• 500K+ documents a day– Scaling to many millions

• Saved $5.5 Billion dollars over last 5 years

Healthcare Providers

Healthcare Providers

Healthcare Providers

Healthcare Providers

1000’s

Healthcare Providers

Healthcare Providers

Healthcare Suppliers

100’s

85% of US marketPurchase OrdersPurchase Order AcknowledgementAdvanced Shipment NoticeInvoiceContractItem MasterVendor Master…

GHX proprietary information: Please do not copy or distribute

Yah so, tell me what the Mongo selection process felt like?

4

This is me being greeted by the committee for homogenous databases

In this picture I am standing to the right of the photographer (see arrow)

This is me after the selection of Mongo was approved

Weighted Selection Criteria

8

Criteria Weight

Scalability

Development throughput

Flexible Data Model

Developer Impact

Database Services Impact

3

1

3

2

2

Applicability of Different NoSQL Data Models

Characteristics (1-4, lower is better)

Data Model

Pe

rfo

rman

ce

Scal

abili

ty

Flex

ibili

ty

Co

mp

lexi

ty

Ind

exSu

pp

ort

Examples Applicability

Key-Value Stores

1 3 1 1 4

Oracle NoSQL , Redis, Riak, Membrain, LevelDB, Voldemort, Castle, Aerospike

LOW Stores a key associated with a binary. Good for cache. We will need multiple keys on objects,

Column Store

1 1 2 3 3Cassandra, HBase,Hypertable, Acuru

MEDIUM Very scalable, good tooling support, no built-in second index, steep learning curve

Document Store

2 2 1 1 1MongoDB, Couchbase, RavenDB, RethinkDB, CouchDB

HIGH Fields everything, flexible model, supports multiple indexes. Low learning. Matches our model.

Graph Database

2 3 4 4 2Neo4j, InfiniteGraph, OrientDB

LOW Very good at quick storage. Queries are harder. Horizontal scale is questionable. Our data is not graph oriented.

9

DBs that made first cut

Platform Applicability Community / Market Traction

Oracle noSQL

LOW Low traction, potential community is high due to Oracle. This is BerkleyDB with an Oracle wrapper. May be good for cache.

Cassandra MEDIUM High traction, good community. Companies are using this to solve very large problems. Scales well. Does not support secondary indexes natively. Good tool support. Good integration with ETL / BI tools.

Hbase MEDIUM High traction, good community. No one is using this successfully for OLTP problems. Very complex.

MongoDB HIGH High traction, high community. Lots of features including many types of indexes. Leading in deployments and growing fastest. Developer friendly. Tooling for operations is command line.

10

GHX proprietary information: Please do not copy or distribute

So there, what was it like when Mongo battled Cassandra?

11

The one who looks like “David” in this picture is actually actually Bryan

Performance Test Results• Mongo config:

– Primary and 2 replicas

• Cassandra config:

– 3 node ring

• Mongo outperformed Cassandra in each test

• Different architectures: Mongo performs best in read heavy scenarios, while Cassandra is the opposite.

• Mongo’s latency was better on each test.

13

0

5000

10000

15000

20000

25000

30000

35000

40000

99/1 90/10 70/30 50/50 30/70

OPS by Read/Write %

MongoDB Datastax

0

1

2

3

4

5

6

7

99/1 90/10 70/30 50/50 30/70

Latency by Read/Write %

MongoDB Datastax

Mongo: Index test results• Added 3 secondary indexes to mongo collection

• YCSB test reflects the cost of index management in writes, but does not use secondary indexes for reads

14

99/1 90/10 70/30 50/50 30/70

Primary Key Only 36274 29244 21042 17381 15270

Add 3 Indexes 31151 23297 15158 11742 9142

Performance Drop 14% 20% 28% 32% 40%

0

5000

10000

15000

20000

25000

30000

35000

40000

Op

era

tio

ns

pe

r Se

con

d

0

1000

2000

3000

4000

5000

6000

7000

8000

99/1 30/70 50/50 70/30 90/10

Read/write ratio

PerformancePerformance w/ indexes

Thro

ugh

pu

tLa

ten

cy

Mongo: Sharding test results

• 90/10 read / write ratio

• This was not the same configuration as previous results. We are interested in the gain, not the raw value

15

Application Server(YCSB)

MongoS (router)

MongoD(shard 1)

MongoD(shard 2)

Application Server(YCSB)

MongoS (router)

Shards Ops/Second Gain %

1 12K ----

2 22.6K 88.3%

Rolling up Results to Match our Requirements

16

Criteria Weight Mongo Cassandra

Scalability

Development throughput

Flexible Data Model

Developer Impact

Database Services Impact

3

1

3

2

2

GHX proprietary information: Please do not copy or distribute

Yah so, what does your data look like?

17

Event Driven Work Broker

Work Broker

Worker A

Worker A

Worker A

Worker A

Worker A

Worker B

Worker A

Worker A

Worker C

2. Broker assigns it toexactly one worker

Event A

1. Reads events from Mongo

Work Event B

3. Work Log and Event B generated by Worker A

4. Repeat…

Distributed Event Broker

• Many readers reading from mongo trying to assign to workers on as fast as they can take the work

Assignment Approach 1: FindandModify

• Steps

– FindandModify

• Find event of certain type that are not assigned

• Mark them as assigned

• Do this BATCH_SIZE times

• Result

– Really high WriteLock% (90+)

– Upset stakeholder

– **mongo did not recommend this

Assignment Approach 2: N Updates and a find

• Steps

– Update event of certain type that are not assigned

• Mark them as assigned with batch ID

• Do this BATCH_SIZE times

– Find all with batch ID

• Result

– Lots of loss # modify > # find

– Upset stakeholder

– Only tested this, never used in prod

Assignment Approach 3: Find-Modify-Find

• Steps– Find event _ID where event unassigned and of

certain type limit BATCH_SIZE

– Modify set of events found by writing a batch ID to assign it

– Find the modified events by batch ID

• Result– WriteLock% went down

– Lots of loss• Count of # found > # modified > # found

Assignment Approach 4: Don’t do it

• Steps– Assign workers a range of events on startup

– Mark every event with a random number token (1-1000)

– Worker updates an event with token that is in its range

• Results (moving to this now)– No conflict or loss expected

– WorkerManager rebalances ranges, monitors health of workers

Throughput Challenges

• 25MM customer documents * 20 activities per doc * 4K per activity = 2TB per day throughput

• 90,000 operations per second at peak load

Scaling

• 7 to 10 shards (21-30 Mongod)– 9,000 to 12,700 db operations per shard per second

Execution

< 1 day

Audit

30 days

Archive

quite a while

GHX proprietary information: Please do not copy or distribute

Yah good, how do you deploy this thing?

26

GHX proprietary information: Please do not copy or distribute

So there, how does ops feel about Mongo after being in prod

for 9 months?

29

GHX proprietary information: Please do not copy or distribute

Thank You

31