![Page 1: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/1.jpg)
CONFIDENTIALCONFIDENTIALCONFIDENTIALCONFIDENTIAL
Geo Searches for Health Care Pricing Data
Robert Stewart
Senior Architect, Castlight Health
@wombatnation
1
![Page 2: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/2.jpg)
CONFIDENTIALCONFIDENTIALCONFIDENTIALCONFIDENTIAL
Castlight Health
The Business and Technical Problems
Initial Solution
MongoDB, Geo Haystack Index and SSDs
Replica Set Flipping
2
![Page 3: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/3.jpg)
3
Hosted web and mobile applications providing unbiased information on health care cost and quality
Customers are employers and health plans
Founded in 2008, raised $181 million in VC funding
#1 on Wall Street Journal’s list of “Top 50 Venture-Backed Companies” for 2011
Hiring!
Castlight Health
![Page 4: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/4.jpg)
4
Home Page
![Page 5: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/5.jpg)
5
Search Results
![Page 6: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/6.jpg)
6
Business Problem
Support searches for
Prices for a procedure performed by any in-network provider in a geographical area
Prices for all procedures performed by a single provider
Sub-second response, even if returning data on thousands of prices
![Page 7: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/7.jpg)
7
Need a very fast geo index
Rate count doubled in last 3 months to 600 million
Major rate updates monthly
Difficult to index data to ensure sequential reads
Sometimes lots of random reads
Technical Problems
![Page 8: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/8.jpg)
8
Pricing Retrieval Architecture
![Page 9: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/9.jpg)
9
Initial Solution
Store pricing data in MySQL
When Pricing Service starts, create two in-memory indexes and cache most of the rates
55 GB JVM Heap with lots of GC tuning
20-minute service startup time to build indexes
3 hours for background caching of most rates
Trouble Brewing: Total rates growing quickly Rolling restart becoming unacceptably slow If rates not in Java or MySQL cache, retrieval was very slow
![Page 10: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/10.jpg)
CONFIDENTIALCONFIDENTIALCONFIDENTIALCONFIDENTIAL
Enter the Mongo
10
![Page 11: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/11.jpg)
11
Geo Indexes
Tried standard geo 2D indexes in MongoDB
Too slow for my use case
Geo Haystack index
Conceptually similar
From docs.mongodb.org “A haystack index is a special index that is optimized to return
results over small areas. Haystack indexes improve performance on queries that use flat geometry.”
![Page 12: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/12.jpg)
12
Mercator Projection with 10 degree grid
![Page 13: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/13.jpg)
13
Geo Haystack
We chose degrees long-lat for x-y coordinate system
25 miles is our default search radius Roughly 0.5 degrees in middle of the US
db.priceables_1.ensureIndex(
{ loc: "geoHaystack", pm: 1 },
{ bucketSize: 0.5 })
db.runCommand(
{ geoSearch: "priceables_1",
near: [-122.4, 37.79],
maxDistance: 0.5,
search: { pm: 6757 },
limit: 50000 })
maxDistance calculated using great circle algorithm
![Page 14: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/14.jpg)
14
Geo Haystack Pros
Very fast when retrieving many documents in a relatively small search radius
Great when you also need to apply a secondary filter Compound 2dsphere index in Mongo 2.4 has even better support
![Page 15: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/15.jpg)
15
Geo Haystack Cons
Supports only one extra filter in index SERVER-2979
A bug if unindexed query on only the second part of the key SERVER-8645
> db.priceables_1.find({pm: 6757})
error: { "$err" : "assertion src/mongo/db/geo/haystack.cpp:178" }
Second part of index can’t have an array value
Location part of key can’t be null
![Page 16: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/16.jpg)
16
SSDs
For uncached data on HDD, Geo Haystack was twice as fast as custom Java geo index and MySQL
Still close to 1 minute for big queries with full data set
Death by random read
Tested with a $200 Samsung SSD Typical query dropped to 20 millis Big query only about 150 millis
![Page 17: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/17.jpg)
17
Random 4k block reads, 5 GB file, 16 threads
Mongoperf on SSDs
Env SSD Read Ops/s Read MB/s
Prod Samsung 200GB SLC 74k 288
QA VM Samsung 200GB SLC 30k 117
Dev Samsung 830 256GB SATA MLC 47k 183
Env SSD Write Ops/s Write MB/s
Prod Samsung 200GB SLC 1074 289
QA VM Samsung 200GB SLC 405 196
Dev Samsung 830 256GB SATA MLC 438 210
Sequential write of the 5 GB file
![Page 18: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/18.jpg)
18
Requirements Major price updates monthly Minor updates more frequently
Huge bulk loads with no impact on active replica set
I/O bound, not CPU bound
Low Impact Pricing Updates
![Page 19: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/19.jpg)
19
Two replica sets
Lowered cost with two SSDs on each pricing server
scp compressed files from QA to passive replica set Protip: to compress and uncompress
tar cvf - pricing | pigz > ~/pricing.tgz
pigz -dc pricing.tgz | tar xvf -
Page in index and data db.runCommand({ touch: "priceables_1", index: true, data: true })
Pricing Service operation to atomically flip
Replica Set Flipping Solution
![Page 20: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/20.jpg)
20
Replica Set Architecture
Physical Servers
ReplicaSets
prodpricing1
prodpricing2
Server pricing1
mongod 28001primary
mongod 28002secondary
Server pricing2
mongod 28001secondary
mongod 28002primary
Server db1
mongod 28001arbiter
Server db2
mongod 28002arbiter
![Page 21: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/21.jpg)
21
Obviously, increased cost, but only for SSDs
Recently added caching of remote pricing lookups TTL collections
Cache is lost during a flip
But, usually flip late at night
Cache eviction time is only a few hours
Replica Set Flipping Drawbacks
![Page 22: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/22.jpg)
22
Geo search speed with cold cache acceptable
Geo search speed with warm cache awesome
Pricing Service startup down to a few seconds
No production impact for major rate updates
Lowered risk for minor rate updates
Overall Results
![Page 23: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/23.jpg)
23
Summary
Geo Haystack Index great for … Retrieving lots of documents in a constrained search area Geo searches with a secondary filter
SSDs great for … Random reads Reducing need for lots of complex indexes
Replica set flipping great for … Instant swap of large amounts of data Primarily, if not solely, read only Trading cost for operational flexibility
![Page 24: MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented by Robert Stewart, Castlight Health](https://reader035.vdocuments.us/reader035/viewer/2022062615/548c95c0b479593d1f8b4998/html5/thumbnails/24.jpg)
CONFIDENTIALCONFIDENTIALCONFIDENTIALCONFIDENTIAL
Q & A
24