mongodb days uk: jumpstart: schema design
TRANSCRIPT
![Page 2: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/2.jpg)
2
Agenda
1. Example Problem
2. MongoDB Schema Design vs. Relational
3. Modeling Relationships
4. Schema Design and Performance
![Page 3: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/3.jpg)
Example Problem
![Page 4: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/4.jpg)
4
Medical Records• Collects all patient information in a central repository• Provide central point of access for
– Patients– Care providers: physicians, nurses, etc.– Billing– Insurance reconciliation
• Hospitals, physicians, patients, procedures, records
PatientRecords
Medications
Lab Results
Procedures
Hospital Records
Physicians
Patients
Nurses
Billing
![Page 5: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/5.jpg)
5
Medical Record Data
• Hospitals – have physicians
• Physicians– Have patients– Perform procedures– Belong to hospitals
• Patients– Have physicians– Are the subject of procedures
• Procedures– Associated with a patient– Associated with a physician– Have a record– Variable meta data
• Records– Associated with a procedure– Binary data– Variable fields
![Page 6: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/6.jpg)
6
Lot of Variability
![Page 7: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/7.jpg)
Relational View
![Page 8: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/8.jpg)
Schema Design:
MongoDB vs. Relational
![Page 9: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/9.jpg)
MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have? What answers do I have?
MongoDB versus Relational
![Page 10: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/10.jpg)
Complex Normalized Schemas
![Page 11: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/11.jpg)
Complex Normalized Schemas
![Page 12: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/12.jpg)
13
Documents are Rich Data Structures{ first_name: ‘Paul’, surname: ‘Miller’, cell: ‘+447557505611’ city: ‘London’, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ]}
Fields can contain an array of sub-documents
Fields
Typed field values
Fields can contain arrays
String
Number
Geo-Coordinates
![Page 13: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/13.jpg)
Relationships
![Page 14: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/14.jpg)
Modeling One-to-One Relationships
![Page 15: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/15.jpg)
16
Referencing
Procedure• patient• date• type• physician• type
Results• dataType• size• content: {…}
Use two collections with a reference
Similar to relational
![Page 16: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/16.jpg)
17
Procedure• patient• date• type• results
• equipmentId• data1• data2
• physician
• Results• type• size• content: {…}
Embedding
Document Schema
![Page 17: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/17.jpg)
18
Referencing (1:1)
Procedure
{ "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : 134}
Results
{ “_id” : 134 "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } }
![Page 18: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/18.jpg)
19
Embedding (1:1)Procedure{ "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : { "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } }}
![Page 19: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/19.jpg)
20
Embedding (1:1)
• Advantages– Retrieve all relevant information in a single query/document– Avoid implementing joins in application code– Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations– Large documents mean more overhead if most fields are not relevant– 16 MB document size limit
![Page 20: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/20.jpg)
23
Referencing (1:1)
• Advantages– Smaller documents– Less likely to reach 16 MB document limit– Infrequently accessed information not accessed on every query
• Limitations– Two queries required to retrieve information– Cannot update related information atomically
![Page 21: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/21.jpg)
24
One to One: General Recommendations
• Embed– No additional data duplication– Can query or index on
embedded field• e.g., “result.type”
• Exceptional cases…• Embedding results in large
documents• Set of infrequently access
fields
{ "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : { "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } }}
![Page 22: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/22.jpg)
Modeling One-to-Many Relationships
![Page 23: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/23.jpg)
26
{ _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”,
…}, { id: 12346, date: 2015-02-15, type: “blood test”,
…}]}
Pat
ient
s
Embed
One-to-Many RelationshipsModeled in 2 possible ways
{ _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [12345, 12346]}
{ _id: 12345, date: 2015-02-15, type: “Cat scan”, …} { _id: 12346, date: 2015-02-15, type: “blood test”, …}
Pat
ient
s
Reference
Pro
cedu
res
![Page 24: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/24.jpg)
27
One to Many: General Recommendations
• Embed, when possible– Access all information in a single query– Take advantage of update atomicity– No additional data duplication– Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases:– 16 MB document size– Large number of infrequently accessed fields
{ _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”,
…}, { id: 12346, date: 2015-02-15, type: “blood test”,
…}]}
![Page 25: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/25.jpg)
Modeling Many-to-Many Relationships
![Page 26: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/26.jpg)
29
Many to ManyTraditional Relational Association
Join table
Physiciansnamespecialtyphone
Hospitalsname
HosPhysicanRelhospitalIdphysicianIdX
Use arrays instead
![Page 27: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/27.jpg)
30
{ _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [ { id: 12345, name: “Joe Doctor”, address: {…},
…}, { id: 12346, name: “Mary Well”, address: {…},
…}]}
Many-to-Many RelationshipsEmbedding physicians in hospitals collection
{ _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [ { id: 63633, name: “Harold Green”, address: {…},
…}, { id: 12345, name: “Joe Doctor”, address: {…},
…}]}
Data Duplication
![Page 28: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/28.jpg)
31
{ _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346]}
Many-to-Many RelationshipsReferencing
{ id: 63633, name: “Harold Green”, address: {…}, …}
Hospitals
{ _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [63633, 12345]}
Physicians
{ id: 12345, name: “Joe Doctor”, address: {…}, …}
{ id: 12346, name: “Mary Well”, address: {…}, …}
![Page 29: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/29.jpg)
32
Many to ManyGeneral Recommendation
• Use case determines whether to reference or embed:1. Data Duplication
• Embedding may result in data duplication
• Duplication may be okay if reads dominate updates
2. Referencing may be required if many related items
3. Hybrid approach• Potentially do both
{ _id: 2, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346]}
{ _id: 12345, name: “Joe Doctor”, address: {…}, …} { _id: 12346, name: “Mary Well”, address: {…}, …}
Hos
pita
ls
Reference
Phy
sici
ans
![Page 30: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/30.jpg)
What If I Want to Store Large Files in MongoDB?
![Page 31: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/31.jpg)
34
GridFS
Driv
erGridFS APIdoc.jpg(meta data)
doc.jpg(1)doc.jpg
(1)doc.jpg(1)
fs.files fs.chunksdoc.jpg
mongofiles utility provides command line GridFS interface
![Page 32: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/32.jpg)
Schema Design and Performance
Two Examples
![Page 33: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/33.jpg)
Example 1: Hybrid Approach
Embed and Reference
![Page 34: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/34.jpg)
37
Healthcare Example
patients
procedures
![Page 35: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/35.jpg)
Tailor Schema to Queries (cont.)
{ "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : ["551ac”, “343fs”]}
{ "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”]}
Patient Procedure
Find all patients from NH that have had chest x-rays
![Page 36: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/36.jpg)
Tailor Schema to Queries (cont.)
{ "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : [ {id : "551ac”, type : “Chest X-ray”}, {id : “343fs”, type : “Blood Test”}]}
{ "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”]}
Patient Procedure
Find all patients from NH that have had chest x-rays
![Page 37: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/37.jpg)
Example 2: Time Series Data
Medical Devices
![Page 38: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/38.jpg)
41
Vital Sign Monitoring Device
Vital Signs Measured:• Blood Pressure• Pulse• Blood Oxygen Levels
Produces data at regular intervals• Once per minute
![Page 39: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/39.jpg)
42
We have a hospital(s) of devices
![Page 40: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/40.jpg)
43
Data From Vital Signs Monitoring Device
{ deviceId: 123456, spO2: 88, pulse: 74, bp: [128, 80], ts: ISODate("2013-10-16T22:07:00.000-0500")}
• One document per minute per device
• Relational approach
![Page 41: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/41.jpg)
44
Document Per Hour (By minute)
{ deviceId: 123456, spO2: { 0: 88, 1: 90, …, 59: 92}, pulse: { 0: 74, 1: 76, …, 59: 72}, bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]}, ts: ISODate("2013-10-16T22:00:00.000-0500")}
• Store per-minute data at the hourly level
• Update-driven workload
• 1 document per device per hour
![Page 42: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/42.jpg)
45
Characterizing Write Differences
• Example: data generated every minute• Recording the data for 1 patient for 1 hour:
Document Per Event60 inserts
Document Per Hour1 insert, 59 updates
![Page 43: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/43.jpg)
46
Characterizing Read Differences
• Want to graph 24 hour of vital signs for a patient:
• Read performance is greatly improved
Document Per Event 1440 reads
Document Per Hour24 reads
![Page 44: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/44.jpg)
47
Characterizing Memory and Storage Differences
Document Per Minute Document Per HourNumber Documents 52.6 B 876 M
Total Index Size 6364 GB 106 GB
_id index 1468 GB 24.5 GB
{ts: 1, deviceId: 1} 4895 GB 81.6 GB
Document Size 92 Bytes 758 Bytes
Database Size 4503 GB 618 GB
• 100K Devices • 1 years worth of data
100000 * 365 * 24 * 60
100000 * 365 * 24
100000 * 365 * 24 * 60 * 130
100000 * 365 * 24 * 130
100000 * 365 * 24 * 60 * 92
100000 * 365 * 24 * 758
![Page 45: MongoDB Days UK: Jumpstart: Schema Design](https://reader036.vdocuments.us/reader036/viewer/2022070515/5876ab491a28abd6508b7a5f/html5/thumbnails/45.jpg)
48
Summary• Relationships can be modeled by embedding or references
• Decision should be made in context of application data and query workload– Tailor schema to application workload
• It is okay recommended to violate RDBMS schema design principles– No duplication of data– Normalization
• Different schemas may result in dramatically different– Query performance– Hardware requirements