the hive think tank: rocking the database world with rocksdb

Post on 08-Jan-2017

516 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MySQL + RocksDBBetter Storage Efficiency Than InnoDB

Feb 3, 2016Siying Dong, Software EngineerDatabase Engineering Team @ Facebook

Why Another MySQL Storage Engine ?

Facebook Website Architecture

MySQLDatabases

Web serversData center

Caches

Facebook Website Architecture

MySQLDatabases

Web serversData center

Caches

SSDs

Facebook Website Architecture

▪ What do the system metrics look like?MySQL

Databases

Web serversData center

Caches

SSDs

Measurements of MySQL Hosts’ Actual Resource Usages ▪ Read IOPS: < 5% ▪ Write IOPS: < 5%▪ Peak Write Bandwidth: < 15%▪ CPU: < 20%▪ Write Endurance: last more than 3 years.▪ Space is the bottleneck

RocksDB Storage Engine in MySQL(MyRocks)

▪ https://github.com/MySQLOnRocksDB/mysql-5.6

RocksDB vs. InnoDB

DB Size Comparison

InnoDB RocksDB00.20.40.60.8

11.2

DB Size (Relative)

Write Amplification Comparison

InnoDB RocksDB00.20.40.60.8

11.2 Bytes Written (Rel-

ative)

Why is RocksDB better?

Lower Space Amplification

RowRow

RowWasted

RowRow

RowWasted

Using 8KB space on storage

Waste

4KB 4KB

InnoDB RocksDBCompressed to 5KB

Uncompressed 16KB pageRowRow

RowWasted

Level 1

Level 2

Level 3

Level 4

Target 1GB

Target 10 GB

Target 100 GB

Target 1000 GB

stale

Level 0

stalestale

stale

stale

stale

Lower Write Amplification (Worst Case)

Row

Read

Modify

Write

Row

Row

Row

Row

Row

Row

Row

Row

Row

Write Amp = Page size / row size

InnoDB RocksDB

Level 1

Level 2

Level 3

Level 4

Target 1GB

Target 10 GB

Target 100 GB

Target 1000 GB

Level 0

Merge

Merge

Merge

Merge

flush

Write amp 1

Write Amp 10

Write Amp 10

Write Amp 10

Write Amp 10

How about query performance?

Queries Per Second (LinkBench)

Reads Per Query (LinkBench)

New RocksDB Features to Support MySQL

Transactions▪ Optimistic and Pessimistic▪ MyRocks Uses Pessimistic

“Single Delete”▪ Secondary index is insert/delete only, never upate▪ Updating a column in secondary key generates one delete + one insert

▪ We can drop “single delete” tombstone as soon as it meets a value

Delete Files in Range▪ Drop Table/Index Needs to claim space fast

Delete Files in Range▪ Drop Table/Index Needs to claim space fast

L0

L1

L2

L3

Table A Table B Table C

Delete Files in Range

L0

L1

L2

L3

Table A Table B Table C

▪ Drop Table/Index Needs to claim space fast

Siying Dong

Delete Files in Range

L0

L1

L2

L3

Table A Table B Table C

▪ Drop Table/Index Needs to claim space fast

Siying Dong

Delete Files in Range

L0

L1

L2

L3

Table A Table C

▪ Drop Table/Index Needs to claim space fast

Siying Dong

What we worked around?

Reverse Column order▪ RocksDB’s Prev() is much slower than Next()▪ Reverse column order to serve common query better

Optimizer Stats▪ Query plans need index statistics▪ MyRocks stores index statistics in data dictionary (similar to InnoDB)

▪ When creating data files (SST files), statistics are also added to the SST files, and data dictionary is also updated

Limitation?

MyRocks LimitationLimitation Plan to Address

MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index

Support them

MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index

Support them

No next key locking support

MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index

Support them

No next key locking supportOnly support replication using row-based binary logging. (durability with XA not supported)

RocksDB to support Two-Phase-Commit. MyRocks to use it to support replication using statement-based binary logging.

MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index

Support them

No next key locking supportOnly support replication using row-based binary logging.(durability with XA not supported)

RocksDB to support Two-Phase-Commit. MyRocks to use it to support replication using statement-based binary logging.

Either ORDER BY DESC or ASC is slower

Improve RocksDB Prev() performance, to narrow performance gap between ORDER BY DESC and ASC.

Conclusion

Conclusion▪ MyRocks vs. InnoDB

▪ Better Space and Write Amplification▪ Comparable performance

▪ Features in RocksDB to Support MyRocks:▪Transactions▪“Single Delete”▪Delete Files in Range

▪ MyRocks has some limitations and they are being addressed.

(c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

top related