look ma! no more blobs
DESCRIPTION
GridFS is a storage mechanism for persisting large binary data in MongoDB.TRANSCRIPT
![Page 1: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/1.jpg)
Look Ma! No more blobs
Aparna Chaudhary
NoSQL matters, @Cologne Germany 2013
![Page 2: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/2.jpg)
EMBRACEPOLYGLOT
PERSISTENCE!
STOP RDBMS ABUSE!
KNOW YOUR USE CASE
![Page 3: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/3.jpg)
Parse
Extract
Store
Read XML
We don't do rocket science...
Use Case
Runtime support for document types
Metadata definition provided at runtime
Document type names - max 50 char
Look up content based on metadata
RA
![Page 4: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/4.jpg)
Challenges
Storage of up to one million documents of 10KB to 2GB per document type per year
Write 1MB < x msec
Retrieve 1MB < y msec
......and detailsRA
But…the Numbers make it interesting...
![Page 5: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/5.jpg)
How?
File System
MongoDB
RDBMS
JCR
Document Management
![Page 6: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/6.jpg)
if you want to store files, its logical to use file system.
ain't it?
File System
✓ Ease of Use
✓ No special skill-set
✓ Backup and Recovery
✓ It’s free!
![Page 7: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/7.jpg)
How do I name them?
Support for metadata storage?
Performance with too many small files?
Query - Administration?
High Availability?
Limitation on total number of
files?
![Page 8: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/8.jpg)
Relational database
IntegrityConsistency
Durability
Atomicity
JoinsBackups
High Availability
You name it, We have it!
RDBMS
Aggregations
![Page 9: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/9.jpg)
RDBMS Developer’s Perspective
![Page 10: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/10.jpg)
Challenge #1
RA
We need runtime support for document type.
RA
We need runtime support for document type.
![Page 11: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/11.jpg)
Challenge #1
DOC_1 DOC_2 DOC_3
DOC_4 DOC_5 DOC_6
Dynamic DDL Generation
DOC_1 DOC_2 DOC_3
DOC_4 DOC_5 DOC_6
Dynamic DDL Generation
![Page 12: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/12.jpg)
Challenge #1String concatenations
are ugly…
DEV
String concatenations are ugly…
DEV
![Page 13: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/13.jpg)
Challenge #1Let's build a utility.
DEV
Let's build a utility.
DEV
![Page 14: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/14.jpg)
Challenge #1
More Work More Work
![Page 15: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/15.jpg)
Challenge #2
RA
Document type is 50 char long
RA
Document type is 50 char long
![Page 16: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/16.jpg)
Challenge #2TABLE NAME LIMITS
Wait…SQL-92 says 128 Char
?We rule. Let's support only
30 char.
TABLE NAME LIMITS
Wait…SQL-92 says 128 Char
?We rule. Let's support only
30 char.
![Page 17: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/17.jpg)
Challenge #2
DOC_TYPE_MAPPING
Let's create a mapping table.
DEV
DOC_TYPE_MAPPING
Let's create a mapping table.
DEV
![Page 18: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/18.jpg)
Challenge #2
Ugly unreadable table names!
Ugly unreadable table names!
![Page 19: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/19.jpg)
So...f inally...Read XML
Dynamic DDL generation
Document Type Alias
DocumentTypeDefined
Yes
No
Extract Metadata
Store Metadata
Store Content
Simple use case becomes complex...
![Page 20: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/20.jpg)
Remember...Our Challenge
QA
Let's see if we are in spec for response time.
Aah..what about performance now?
DEV
![Page 21: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/21.jpg)
MongoDB
Document BasedGridFS
B-TreeDynamic Schema
JSON
BSON Query
Scalablehttp://www.10gen.com/presentations/storage-engine-internals
Joins
Complex Transaction
![Page 22: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/22.jpg)
F1 F2 F3 F4 F5ID1
ID2
ID3
ID4
ID5
F1
F1
F1
F1
F2
F2 F3 F4 F5 F6
F2 F3 F4 F5 Fx
F8
F3
F9 F7
Concepts
Database
Collection
Collection Collection Collection
CollectionCollection
Database
Collection
Collection Collection Collection
CollectionCollection
Database
Collection
Collection Collection Collection
CollectionCollection
Database
Collection
Collection Collection Collection
CollectionCollection
Table = Collection
Column = Field
Row = Document
Database = Database
![Page 23: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/23.jpg)
GridFS
MongoDB divides the
large content into
chunks
Stores Metadata and Chunks separately
http://docs.mongodb.org/manual/core/gridfs/
![Page 24: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/24.jpg)
> mybucket.files{ "_id" : ObjectId("514d5cb8c2e6ea4329646a5c"),
"chunkSize" : NumberLong(262144),
"length" : NumberLong(103015),
"md5" : "34d29a163276accc7304bd69c5520e55",
"filename" : "health_record_2.xml",
"contentType" : application/xml,
"uploadDate" : ISODate("2013-03-23T07:41:44.907Z"),
"aliases" : null,
"metadata" : { "fname" : "Aparna", "lname" : "Chaudhary","country" : "Netherlands" }
}
ObjectId - 12 Byte BSON:4 Byte - Seconds since Epoch3 Byte - Machine Id2 Byte - Process Id3 Byte - Counter
![Page 25: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/25.jpg)
> mybucket.chunks
{ "_id" : ObjectId("514d5cb8c2e6ea4329646a5d"), "files_id" : ObjectId("514d5cb8c2e6ea4329646a5c"),
"n" : 0,
"data" : BinData(0,...)
}
![Page 26: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/26.jpg)
?I'm storing 10KB file, but
would it use 256KB on disk?
Last Chunk =
FileSize % 256+
Metadata overhead
256
1128KB
256 256 256 104 + x
10KB
10 + x
Chunk is as big as it
needs to be...
![Page 27: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/27.jpg)
Challenge #1
DEV
MongoDB supports Dynamic Schema.
You can use collection per docType and they are created dynamically.
RA
We need runtime support for document type.
![Page 28: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/28.jpg)
Challenge #2
RA
Document type is 50 char long
DEV
MongoDB namespace can be up to 123 char.
![Page 29: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/29.jpg)
So...f inally...
Simple use case remains simple...well becomes
simpler...
Read XML
Extract Metadata
Store Metadata & Content
![Page 30: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/30.jpg)
Remember...Our Challenge
QA
Let's see if we are in spec for response time.
DEV
Performance test is part of our definition of 'DONE'
![Page 31: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/31.jpg)
BEcause seeing is believing!
Demo
‣ GridFS 2.4.0
‣ PostgreSQL 9.2
‣ Spring Data
‣ JMeter 2.7
‣ Mac OS X 10.8.3 2.3GHz Quad-Core Intel Core i7, 16GB RAM
https://github.com/aparnachaudhary/nosql-matters-demo
![Page 32: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/32.jpg)
EMBRACEPOLYGLOT
PERSISTENCE!
STOP RDBMS ABUSE!
KNOW YOUR USE CASE
@aparnachaudhary
![Page 33: Look Ma! No more blobs](https://reader033.vdocuments.us/reader033/viewer/2022051610/54813c5db4af9fea158b5ea9/html5/thumbnails/33.jpg)
Java Developer, Data Lover
Eindhoven, Netherlands
http://blog.aparnachaudhary.com/
@aparnachaudhary
Thank You!