mongodb world 2015 - a technical introduction to wiredtiger

28

Upload: wiredtiger

Post on 28-Jul-2015

1.209 views

Category:

Data & Analytics


6 download

TRANSCRIPT

Page 1: MongoDB World 2015 - A Technical Introduction to WiredTiger
Page 2: MongoDB World 2015 - A Technical Introduction to WiredTiger

A Technical Introduction to WiredTiger

Michael Cahill

Director of Engineering (Storage), MongoDB

Page 3: MongoDB World 2015 - A Technical Introduction to WiredTiger

You may have seen this:

Page 4: MongoDB World 2015 - A Technical Introduction to WiredTiger

or this…

Page 5: MongoDB World 2015 - A Technical Introduction to WiredTiger

How does WiredTiger do it?

Page 6: MongoDB World 2015 - A Technical Introduction to WiredTiger

6

What’s different about WiredTiger?

• Document-level concurrency• In-memory performance• Multi-core scalability• Checksums• Compression• Durability with and without journaling

Page 7: MongoDB World 2015 - A Technical Introduction to WiredTiger

7

This presentation is not…

• How to write stand-alone WiredTiger apps• How to configure MongoDB with WiredTiger for your workload

Page 8: MongoDB World 2015 - A Technical Introduction to WiredTiger

WiredTiger Background

Page 9: MongoDB World 2015 - A Technical Introduction to WiredTiger

9

Why create a new storage engine?

• Minimize contention between threads– lock-free algorithms, hazard pointers– eliminate blocking due to concurrency control

• Hotter cache and more work per I/O– compact file formats– compression– big-block I/O

Page 10: MongoDB World 2015 - A Technical Introduction to WiredTiger

10

MongoDB’s Storage Engine API

• Allows different storage engines to "plug-in"– Different workloads have different performance characteristics – mmap is not ideal for all workloads– More flexibility

• mix storage engines on same replica set/sharded cluster• Opportunity to integrate further (HDFS, native encrypted,

hardware optimized …) • Great way for us to demonstrate WiredTiger’s performance

Page 11: MongoDB World 2015 - A Technical Introduction to WiredTiger

11

Storage Engine Layer

Content Repo

IoT Sensor Backend

Ad ServiceCustomer Analytics

Archive

MongoDB Query Language (MQL) + Native Drivers

MongoDB Document Data Model

MMAP V1 WT In-Memory ? ?

Supported in MongoDB 3.0 Future Possible Storage Engines

Man

agem

ent

Sec

urity

Example Future State

Experimental

Page 12: MongoDB World 2015 - A Technical Introduction to WiredTiger

WiredTiger Architecture

Page 13: MongoDB World 2015 - A Technical Introduction to WiredTiger

13

WiredTiger Architecture

In-memory consistency

Per-file checkpoints

Page 14: MongoDB World 2015 - A Technical Introduction to WiredTiger

14

Trees in cache

1 memory flush

read required

disk image

Page 15: MongoDB World 2015 - A Technical Introduction to WiredTiger

15

Pages in cache

constructed during read

skiplist

reconciled during write

Page 16: MongoDB World 2015 - A Technical Introduction to WiredTiger

Concurrencyand

multi-core scaling

Page 17: MongoDB World 2015 - A Technical Introduction to WiredTiger

17

Multiversion Concurrency Control (MVCC)

• Multiple versions of records kept in cache• Readers see the version that was committed before the operation

started– MongoDB “yields” turn large operations into small transactions

• Writers can create new versions concurrent with readers• Concurrent updates to a single record cause write conflicts

– MongoDB retries with back-off

Page 18: MongoDB World 2015 - A Technical Introduction to WiredTiger

Transforming data during I/O

Page 19: MongoDB World 2015 - A Technical Introduction to WiredTiger

19

Checksums

• A checksum is stored with every page• WiredTiger stores the checksum with the page address (typically

in a parent page)– Extra safety against reading an old, valid page image

• Checksums are validated during page read– Detects filesystem corruption, random bitflips

Page 20: MongoDB World 2015 - A Technical Introduction to WiredTiger

20

Compression

• WiredTiger uses snappy compression by default in MongoDB• supported compression algorithms

– snappy [default]: good compression, low overhead– zlib: better compression, more CPU– none

• Indexes are compressed using prefix compression– allows compression in memory

Page 21: MongoDB World 2015 - A Technical Introduction to WiredTiger

Durabilitywith and without

Journal

Page 22: MongoDB World 2015 - A Technical Introduction to WiredTiger

22

Writing a checkpoint

1. Write the leaves

2. Write the internal pages, including the root– the old checkpoint is still valid

3. Sync the file

4. Write the new root’s address to the metadata– free pages from old checkpoints once the metadata is durable

Page 23: MongoDB World 2015 - A Technical Introduction to WiredTiger

23

Durability without Journaling

• MMAPv1 uses a write-ahead log (journal) to guarantee consistency – Running with “nojournal” is unsafe

• WiredTiger doesn't have this need: no in-place updates– Write-ahead log can be truncated at checkpoints

• Every 2GB or 60sec by default – configurable– Updates are written to optional journal as they commit

• Not flushed on every commit by default• Recovery rolls forward from last checkpoint

• Replication can guarantee durability

Page 24: MongoDB World 2015 - A Technical Introduction to WiredTiger

24

Journal and recovery

• Optional write-ahead logging• Only written at transaction commit• snappy compression by default• Group commit• Automatic log archive / removal• On startup, we rely on finding a consistent checkpoint in the

metadata• Check LSNs in the metadata to figure out where to roll forward

from

Page 25: MongoDB World 2015 - A Technical Introduction to WiredTiger

25

So, what’s different about WiredTiger?

• Multiversion Concurrent Control– No lock manager

• Non-locking data structures– Multi-core scalability

• Different representation of in-memory data vs on-disk– Enabled checksums, compression

• Copy-on-write storage– Durability without a journal

Page 26: MongoDB World 2015 - A Technical Introduction to WiredTiger

What’s next?

Page 27: MongoDB World 2015 - A Technical Introduction to WiredTiger

27

What’s next for WiredTiger?

• WiredTiger btree access method is optional in 3.0• plan to make it the default in 3.2• tuning for (many) more workloads• make Log Structured Merge (LSM) work well with MongoDB

– out-of-cache, write-heavy workloads• adding encryption• more advanced transactional semantics in the storage engine API

Page 28: MongoDB World 2015 - A Technical Introduction to WiredTiger

Thanks!

Questions?

Michael Cahill

[email protected]