![Page 1: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/1.jpg)
CONFIDENTIALCONFIDENTIALCONFIDENTIALCONFIDENTIAL
Geo Searches for Health Care Pricing Datawith MongoDB
NoSQL Now 2013
Robert Stewart
Senior Architect, Castlight Health
@wombatnation
1
![Page 2: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/2.jpg)
CONFIDENTIALCONFIDENTIALCONFIDENTIALCONFIDENTIAL
Castlight Health
The Business and Technical Problems
Initial Solution
MongoDB, Geospatial Indexes and SSDs
Replica Set Flipping
2
![Page 3: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/3.jpg)
3
Hosted web and mobile applications providing unbiased information on health care cost and quality
Customers are employers and health plans
Founded in San Francisco in 2008
$181 million in VC funding
#1 on Wall Street Journal’s list of “Top 50 Venture-Backed Companies” for 2011
Hiring!
Castlight Health
![Page 4: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/4.jpg)
4
Home Page
![Page 5: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/5.jpg)
5
Search Results
![Page 6: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/6.jpg)
6
Business Problem
Support searches for
Prices for a procedure performed by any in-network provider in a geographical area
Prices for all procedures performed by a single provider
Sub-second response, even if returning data on thousands of prices
![Page 7: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/7.jpg)
7
Need a very fast geospatial index
Rate count at 1 billion and rising
Major rate updates monthly
Difficult to index data to ensure sequential reads
Sometimes lots of random reads
Technical Problems
Apr-11 Jun-11 Aug-11 Oct-11 Dec-11 Feb-12 Apr-12 Jun-12 Aug-12 Oct-12 Dec-12 Feb-13 Apr-13 Jun-13 Aug-13
![Page 8: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/8.jpg)
8
Pricing Retrieval Architecture
![Page 9: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/9.jpg)
9
Initial Solution
Store pricing data in MySQL
When Pricing Service starts, create two in-memory indexes and cache most of the rates
55 GB JVM Heap with lots of GC tuning
20-minute service startup time to build indexes
3 hours for background caching of most rates
Trouble Brewing: Total rates growing quickly Rolling restart becoming unacceptably slow If rates not in Java or MySQL cache, retrieval was very slow
![Page 10: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/10.jpg)
CONFIDENTIALCONFIDENTIALCONFIDENTIALCONFIDENTIAL
Enter the Mongo
10
![Page 11: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/11.jpg)
11
Geospatial Indexes We Evaluated
Standard 2D index in MongoDB 2.2 too slow for my use case
Geo Haystack index From docs.mongodb.org:
“A haystack index is a special index that is optimized to return results over small areas. Haystack indexes improve performance on queries that use flat geometry.”
2DSphere index in MongoDB 2.4
![Page 12: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/12.jpg)
12
Mercator Projection with 10 degree grid
![Page 13: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/13.jpg)
13
Geo Haystack
We chose degrees long-lat for x-y coordinate system
25 miles is our default search radius Roughly 0.5 degrees in middle of the US
db.priceables_1.ensureIndex(
{ loc: "geoHaystack", pm: 1 },
{ bucketSize: 0.5 })
db.runCommand(
{ geoSearch: "priceables_1",
near: [-122.4, 37.79],
maxDistance: 0.5,
search: { pm: 6757 },
limit: 50000 })
![Page 14: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/14.jpg)
14
Geo Haystack Cons
Only one secondary filter
Second part of index can’t have an array value
Error on unindexed query on only the second part of the key
![Page 15: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/15.jpg)
15
Supports earth-like spherical geometries
Points can be GeoJSON or x,y pairs
GeoJSON LineString and Polygon
Queries for inclusion, intersection and proximity
2DSphere Index
![Page 16: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/16.jpg)
16
db.priceables_1.ensureIndex(
{ loc: "2dsphere", pm: 1, pn : 1 })
db.priceables_1.find(
{ "loc" :
{ "$geoWithin" :
{ "$centerSphere" :
[ [ -94.2128 , 36.3840], 0.006314]}},
"pm" : 6441,
"pn" : { "$in" : [ 5236 , 5237 ]
}})
2DSphere Index Creation and Sample Query
![Page 17: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/17.jpg)
17
Geospatially Accurate
Even Faster than Haystack
2DSphere Results
![Page 18: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/18.jpg)
18
SSDs
For uncached data on HDD, MongoDB geo index was twice as fast as custom Java geo index with MySQL
Still close to 1 minute for big queries with full data set
Death by random read
Tested with a $200 Samsung SSD Typical query dropped to 20 millis Big query only about 150 millis
![Page 19: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/19.jpg)
19
Random 4k block reads, 5 GB file, 16 threads
Mongoperf on SSDs
Env SSD Read Ops/s Read MB/s
Prod Samsung 200GB SLC 74k 288
QA VM Samsung 200GB SLC 30k 117
Dev Samsung 830 256GB SATA MLC 47k 183
Env SSD Write Ops/s Write MB/s
Prod Samsung 200GB SLC 1074 289
QA VM Samsung 200GB SLC 405 196
Dev Samsung 830 256GB SATA MLC 438 210
Sequential write of the 5 GB file
![Page 20: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/20.jpg)
20
Requirements Major price updates monthly Minor updates more frequently
Huge bulk loads with no impact on active replica set
I/O bound, not CPU bound
Solution Two MongoDB replica sets Multiple SSDs per server
Low Impact Pricing Updates
![Page 21: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/21.jpg)
21
Replica Set Architecture
Physical Servers
ReplicaSets
prodpricing1
prodpricing2
Server pricing1
mongod 28001primary
mongod 28002secondary
Server pricing2
mongod 28001secondary
mongod 28002primary
Server db1
mongod 28001arbiter
Server db2
mongod 28002arbiter
![Page 22: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/22.jpg)
22
Transfer compressed data files to passive replica set Protip: to compress and uncompress
tar cvf - pricing | pigz > ~/pricing.tgz
pigz -dc pricing.tgz | tar xvf -
Page in index and data db.runCommand({ touch: "priceables_1", index: true, data: true })
Pricing Service operation to atomically flip
Replica Set Flipping Solution
![Page 23: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/23.jpg)
23
Obviously, increased cost, but only for extra SSDs
Recently added caching of remote pricing lookups TTL collections
Cache is lost during a flip
But, usually flip late at night
Cache eviction time is only a few hours
Replica Set Flipping Drawbacks
![Page 24: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/24.jpg)
24
Geo search speed with cold cache acceptable
Geo search speed with warm cache awesome
Pricing Service startup down to a few seconds
No production impact for major rate updates
Lowered risk for minor rate updates
Overall Results
![Page 25: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/25.jpg)
25
Summary
Geo Haystack Index great for … Retrieving lots of documents in a constrained search area Very simple geospatial searches with a single secondary filter
2DSphere Index great for … Complex geospatial searches or complex indexing
SSDs great for … Random reads Reducing need for lots of complex indexes
Replica set flipping great for … Instant swap of large amounts of data Primarily, if not solely, read only Trading cost for operational flexibility
![Page 26: Geo Searches for Health Care Pricing Data with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051323/548c95c1b4795927358b4c76/html5/thumbnails/26.jpg)
CONFIDENTIALCONFIDENTIALCONFIDENTIALCONFIDENTIAL
Q & A
26