mongodb world 2015 - a technical introduction to wiredtiger
TRANSCRIPT
![Page 1: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/1.jpg)
![Page 2: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/2.jpg)
A Technical Introduction to WiredTiger
Michael CahillDirector of Engineering (Storage), MongoDB
![Page 3: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/3.jpg)
You may have seen this:
![Page 4: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/4.jpg)
or this…
![Page 5: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/5.jpg)
How does WiredTiger do it?
![Page 6: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/6.jpg)
6
What’s different about WiredTiger?
• Document-level concurrency• In-memory performance• Multi-core scalability• Checksums• Compression• Durability with and without journaling
![Page 7: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/7.jpg)
7
This presentation is not…
• How to write stand-alone WiredTiger apps• How to configure MongoDB with WiredTiger for your workload
![Page 8: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/8.jpg)
WiredTiger Background
![Page 9: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/9.jpg)
9
Why create a new storage engine?
• Minimize contention between threads– lock-free algorithms, hazard pointers– eliminate blocking due to concurrency control
• Hotter cache and more work per I/O– compact file formats– compression– big-block I/O
![Page 10: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/10.jpg)
10
MongoDB’s Storage Engine API
• Allows different storage engines to "plug-in"– Different workloads have different performance characteristics – mmap is not ideal for all workloads– More flexibility
• mix storage engines on same replica set/sharded cluster• Opportunity to integrate further (HDFS, native encrypted,
hardware optimized …) • Great way for us to demonstrate WiredTiger’s performance
![Page 11: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/11.jpg)
11
Storage Engine Layer
Content Repo
IoT Sensor Backend Ad Service Customer
Analytics Archive
MongoDB Query Language (MQL) + Native Drivers
MongoDB Document Data Model
MMAP V1 WT In-Memory ? ?
Supported in MongoDB 3.0 Future Possible Storage Engines
Man
agem
ent
Sec
urity
Example Future State
Experimental
![Page 12: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/12.jpg)
WiredTiger Architecture
![Page 13: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/13.jpg)
13
WiredTiger Architecture
In-memory consistency
Per-file checkpoints
![Page 14: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/14.jpg)
14
Trees in cache
1 memory flush
read required
disk image
![Page 15: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/15.jpg)
15
Pages in cacheconstructed during read
skiplist
reconciled during write
![Page 16: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/16.jpg)
Concurrencyand
multi-core scaling
![Page 17: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/17.jpg)
17
Multiversion Concurrency Control (MVCC)
• Multiple versions of records kept in cache• Readers see the version that was committed before the operation
started– MongoDB “yields” turn large operations into small transactions
• Writers can create new versions concurrent with readers• Concurrent updates to a single record cause write conflicts
– MongoDB retries with back-off
![Page 18: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/18.jpg)
Transforming data during I/O
![Page 19: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/19.jpg)
19
Checksums
• A checksum is stored with every page• WiredTiger stores the checksum with the page address (typically
in a parent page)– Extra safety against reading an old, valid page image
• Checksums are validated during page read– Detects filesystem corruption, random bitflips
![Page 20: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/20.jpg)
20
Compression
• WiredTiger uses snappy compression by default in MongoDB• supported compression algorithms
– snappy [default]: good compression, low overhead– zlib: better compression, more CPU– none
• Indexes are compressed using prefix compression– allows compression in memory
![Page 21: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/21.jpg)
Durabilitywith and without
Journal
![Page 22: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/22.jpg)
22
Writing a checkpoint
1. Write the leaves2. Write the internal pages, including the root
– the old checkpoint is still valid3. Sync the file4. Write the new root’s address to the metadata
– free pages from old checkpoints once the metadata is durable
![Page 23: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/23.jpg)
23
Durability without Journaling
• MMAPv1 uses a write-ahead log (journal) to guarantee consistency – Running with “nojournal” is unsafe
• WiredTiger doesn't have this need: no in-place updates– Write-ahead log can be truncated at checkpoints
• Every 2GB or 60sec by default – configurable– Updates are written to optional journal as they commit
• Not flushed on every commit by default• Recovery rolls forward from last checkpoint
• Replication can guarantee durability
![Page 24: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/24.jpg)
24
Journal and recovery
• Optional write-ahead logging• Only written at transaction commit• snappy compression by default• Group commit• Automatic log archive / removal• On startup, we rely on finding a consistent checkpoint in the
metadata• Check LSNs in the metadata to figure out where to roll forward
from
![Page 25: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/25.jpg)
25
So, what’s different about WiredTiger?
• Multiversion Concurrent Control– No lock manager
• Non-locking data structures– Multi-core scalability
• Different representation of in-memory data vs on-disk– Enabled checksums, compression
• Copy-on-write storage– Durability without a journal
![Page 26: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/26.jpg)
What’s next?
![Page 27: MongoDB World 2015 - A Technical Introduction to WiredTiger](https://reader035.vdocuments.us/reader035/viewer/2022070521/58f9a9a7760da3da068b70e4/html5/thumbnails/27.jpg)
27
What’s next for WiredTiger?
• WiredTiger btree access method is optional in 3.0• plan to make it the default in 3.2• tuning for (many) more workloads• make Log Structured Merge (LSM) work well with MongoDB
– out-of-cache, write-heavy workloads• adding encryption• more advanced transactional semantics in the storage engine API