mongodb world 2015 - a technical introduction to wiredtiger

28

Upload: wiredtiger

Post on 21-Apr-2017

10.464 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: MongoDB World 2015 - A Technical Introduction to WiredTiger
Page 2: MongoDB World 2015 - A Technical Introduction to WiredTiger

A Technical Introduction to WiredTiger

Michael CahillDirector of Engineering (Storage), MongoDB

Page 3: MongoDB World 2015 - A Technical Introduction to WiredTiger

You may have seen this:

Page 4: MongoDB World 2015 - A Technical Introduction to WiredTiger

or this…

Page 5: MongoDB World 2015 - A Technical Introduction to WiredTiger

How does WiredTiger do it?

Page 6: MongoDB World 2015 - A Technical Introduction to WiredTiger

6

What’s different about WiredTiger?

• Document-level concurrency• In-memory performance• Multi-core scalability• Checksums• Compression• Durability with and without journaling

Page 7: MongoDB World 2015 - A Technical Introduction to WiredTiger

7

This presentation is not…

• How to write stand-alone WiredTiger apps• How to configure MongoDB with WiredTiger for your workload

Page 8: MongoDB World 2015 - A Technical Introduction to WiredTiger

WiredTiger Background

Page 9: MongoDB World 2015 - A Technical Introduction to WiredTiger

9

Why create a new storage engine?

• Minimize contention between threads– lock-free algorithms, hazard pointers– eliminate blocking due to concurrency control

• Hotter cache and more work per I/O– compact file formats– compression– big-block I/O

Page 10: MongoDB World 2015 - A Technical Introduction to WiredTiger

10

MongoDB’s Storage Engine API

• Allows different storage engines to "plug-in"– Different workloads have different performance characteristics – mmap is not ideal for all workloads– More flexibility

• mix storage engines on same replica set/sharded cluster• Opportunity to integrate further (HDFS, native encrypted,

hardware optimized …) • Great way for us to demonstrate WiredTiger’s performance

Page 11: MongoDB World 2015 - A Technical Introduction to WiredTiger

11

Storage Engine Layer

Content Repo

IoT Sensor Backend Ad Service Customer

Analytics Archive

MongoDB Query Language (MQL) + Native Drivers

MongoDB Document Data Model

MMAP V1 WT In-Memory ? ?

Supported in MongoDB 3.0 Future Possible Storage Engines

Man

agem

ent

Sec

urity

Example Future State

Experimental

Page 12: MongoDB World 2015 - A Technical Introduction to WiredTiger

WiredTiger Architecture

Page 13: MongoDB World 2015 - A Technical Introduction to WiredTiger

13

WiredTiger Architecture

In-memory consistency

Per-file checkpoints

Page 14: MongoDB World 2015 - A Technical Introduction to WiredTiger

14

Trees in cache

1 memory flush

read required

disk image

Page 15: MongoDB World 2015 - A Technical Introduction to WiredTiger

15

Pages in cacheconstructed during read

skiplist

reconciled during write

Page 16: MongoDB World 2015 - A Technical Introduction to WiredTiger

Concurrencyand

multi-core scaling

Page 17: MongoDB World 2015 - A Technical Introduction to WiredTiger

17

Multiversion Concurrency Control (MVCC)

• Multiple versions of records kept in cache• Readers see the version that was committed before the operation

started– MongoDB “yields” turn large operations into small transactions

• Writers can create new versions concurrent with readers• Concurrent updates to a single record cause write conflicts

– MongoDB retries with back-off

Page 18: MongoDB World 2015 - A Technical Introduction to WiredTiger

Transforming data during I/O

Page 19: MongoDB World 2015 - A Technical Introduction to WiredTiger

19

Checksums

• A checksum is stored with every page• WiredTiger stores the checksum with the page address (typically

in a parent page)– Extra safety against reading an old, valid page image

• Checksums are validated during page read– Detects filesystem corruption, random bitflips

Page 20: MongoDB World 2015 - A Technical Introduction to WiredTiger

20

Compression

• WiredTiger uses snappy compression by default in MongoDB• supported compression algorithms

– snappy [default]: good compression, low overhead– zlib: better compression, more CPU– none

• Indexes are compressed using prefix compression– allows compression in memory

Page 21: MongoDB World 2015 - A Technical Introduction to WiredTiger

Durabilitywith and without

Journal

Page 22: MongoDB World 2015 - A Technical Introduction to WiredTiger

22

Writing a checkpoint

1. Write the leaves2. Write the internal pages, including the root

– the old checkpoint is still valid3. Sync the file4. Write the new root’s address to the metadata

– free pages from old checkpoints once the metadata is durable

Page 23: MongoDB World 2015 - A Technical Introduction to WiredTiger

23

Durability without Journaling

• MMAPv1 uses a write-ahead log (journal) to guarantee consistency – Running with “nojournal” is unsafe

• WiredTiger doesn't have this need: no in-place updates– Write-ahead log can be truncated at checkpoints

• Every 2GB or 60sec by default – configurable– Updates are written to optional journal as they commit

• Not flushed on every commit by default• Recovery rolls forward from last checkpoint

• Replication can guarantee durability

Page 24: MongoDB World 2015 - A Technical Introduction to WiredTiger

24

Journal and recovery

• Optional write-ahead logging• Only written at transaction commit• snappy compression by default• Group commit• Automatic log archive / removal• On startup, we rely on finding a consistent checkpoint in the

metadata• Check LSNs in the metadata to figure out where to roll forward

from

Page 25: MongoDB World 2015 - A Technical Introduction to WiredTiger

25

So, what’s different about WiredTiger?

• Multiversion Concurrent Control– No lock manager

• Non-locking data structures– Multi-core scalability

• Different representation of in-memory data vs on-disk– Enabled checksums, compression

• Copy-on-write storage– Durability without a journal

Page 26: MongoDB World 2015 - A Technical Introduction to WiredTiger

What’s next?

Page 27: MongoDB World 2015 - A Technical Introduction to WiredTiger

27

What’s next for WiredTiger?

• WiredTiger btree access method is optional in 3.0• plan to make it the default in 3.2• tuning for (many) more workloads• make Log Structured Merge (LSM) work well with MongoDB

– out-of-cache, write-heavy workloads• adding encryption• more advanced transactional semantics in the storage engine API

Page 28: MongoDB World 2015 - A Technical Introduction to WiredTiger

Thanks!

Questions?

Michael [email protected]