20131112 pluk fractal trees theory to practice
TRANSCRIPT
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
1/53
Fractal TreeIndexes
Theoryto Practice
Percona Live London 2013
Tim Callaghan, [email protected]
@tmcallaghan
Tuesday, November 12, 13
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected] -
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
2/53
Ever seen this?
IO Utilization Graph, performance is IO limited
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
3/53
Who is Tokutek?
Tokutek builds high-performance databasesoftware!
TokuDB - storage engine forMySQL and MariaDB
TokuMX - storage engine forMongoDB
HDD & SSDstorage
Storage Engine
Developer Interface
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
4/53
Who am I?
17 year database consumer
schema design, development, deployment
database administration + infrastructure
mostly Oracle
5 year database producer
2 years @ VoltDB
2+ years @ Tokutek
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
5/53
Housekeeping
Feedback is important to me
Ideas for Webinars or Presentations?
Whos using MongoDB?
Anyone using TokuDB or TokuMX?
Please ask questions
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
6/53
Agenda
Why Fractal Tree indexes are cool
What they enable in MySQL(TokuDB)
What they enable in MongoDB
(TokuMX)
Q+A
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
7/53
Indexing:
B-trees andFractal Tree Indexes
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
8/53
B-trees
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
9/53
B-tree Overview - vocabulary
Internal Nodes -Path to data
Leaf Nodes -Actual Data -
Sorted
Pointers
Pivots
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
10/53
B-tree Overview - example
22
10 99
2, 3, 4 10,20 22,25 99
* Pivot Rule is >=
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
11/53
B-tree Overview - search
22
10 99
2, 3, 4 10,20 22,25 99
Find 25
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
12/53
B-tree Overview - insert
22
10 99
2, 3, 4 10,15,20 22,25 99
Insert 15
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
13/53
RAM
RAM
DISK
B-tree Overview - performance
22
10 99
2, 3, 4 10,20 22,25 99
Performance is IO limited when data > RAM,one IO is needed for each insert/update
(actually its one IO for every index on the table)
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
14/53
Fractal Tree Indexes
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
15/53
Fractal Tree Indexes
similar to B-trees
store data in leaf nodesuse index key for ordering
messagebuffer
message
buffer
messagebuffer
All internal nodeshave message
buffers
different than B-trees
message buffersbig nodes (4MB vs. ~16KB)
As buffers overflow,they cascade down
the tree
Messages areeventually applied to
leaf nodes
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
16/53
Fractal Tree Indexes - sample data
25
10 99
2,3,4 10,20 22,25 99
Looks a lot like a b-tree!
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
17/53
insert 15;
Fractal Tree Indexes - insert
25
10 99
2,3,4 10,20 22,25 99
insert (15)
search operations must consider messages along the way messages cascade down the tree as buffers fill up they are eventually applied to the leaf nodes, hundreds or
thousands of operations for a single IO
CPU and cache are conserved as important data is not ejected
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
18/53
Fractal Tree Indexes - other operations
25
10 99
2,3,4 10,20 22,25 99
add_column(c4 bigint)delete(99)increment(22,+5)
...
insert (100)delete(8)delete(2)
insert (8)
Lots of operations can be messages!
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
19/53
TokuDB
Fractal Tree Indexing +
MySQL/MariaDB
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
20/53
What is TokuDB?
Transactional MySQL Storage Engine - think InnoDB
Available for MySQL 5.5 and MariaDB 5.5
ACID and MVCC
Free/OSS Community Edition http://github.com/Tokutek/ft-engine
Enterprise Edition
Commercial support + hot backup
20
Performance + Compression + Agility
Tuesday, November 12, 13
https://github.com/Tokutek/ft-enginehttps://github.com/Tokutek/ft-engine -
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
21/53
TokuDB Performance
Warning - Benchmarks Ahead!
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
22/53
Indexed Insertion Performance
High-performance insert/update/delete for largedatabases (> RAM) while maintaining indexes
22
* old numbers, now > 25K/sec
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
23/53
Sysbench Performance
Sysbench read/write workload, > RAM
23
The fastest IO is the one you never have to do (compression)
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
24/53
Efficient index maintenance, especially secondaryindexes
Clustered secondary indexes
Additional copy of the row is stored in the index
No additional IO to get row data from primary key
Think better covering index (all non-indexed columns)
Compression eliminates size concerns
Big blocks = sequential IO for range scans
Basement nodes are always co-located
Multi-threaded bulk loader
24
Performance Advantages
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
25/53
TokuDB Compression
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
26/53
Compression: TokuDB vs. InnoDB
InnoDB compression misses force node splits, whichgreatly reduces performance
MySQL 5.6 dynamic padding (from FB), less cache
Larger block size and flexible on-disk size wins!
Multiple compression algorithms (lzma, quicklz, zlib)
Larger, less frequent writes (much less IO)
Why it matters on spinning disks:
Compressed reads and amortized compressed writes
overcome IO limitations Why it matters on flash/SSD:
Buy less : 250GB * 10x = as 2.5TB)
Large/less frequent writes are flash friendly
26
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
27/53
Compression + IO Reduction
Server was at 90% IO utilization with InnoDB,10% IO utilization with TokuDB
27
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
28/53
Compression Performance
iiBench benchmark
28
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
29/53
Compression Achieved
log data (extremely compressible)
29
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
30/53
TokuDB Agility
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
31/53
The Challenge of MySQL Schema Changes
Common schema changes can take hours inMySQL
Adding, dropping, or expanding a column
Adding an index
And the table is unavailable for writes during theprocess
As a workaround, people generally
Use a replication slave, then swap with master
Use helper tools: Percona OSC, MySQL 5.6
o These have IO, CPU, RAM consequences
31
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
32/53
Schema Changes Without Downtime
In TokuDB, column add/drop/expand isinstantaneous
its just a message
Indexes can be created in the background while
table is fully available TokuDB just builds the index, it does not
rebuild the table (MySQL getting better)
32
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
33/53
TokuMX
Fractal Tree Indexing +
MongoDB
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
34/53
What is TokuMX?
TokuMX = MongoDB with improved storage (Fractal Tree indexes)
Drop in replacement for MongoDB v2.2 applications
Including replication and sharding
Same data model Same query language
Drivers just work
Open Source
http://github.com/Tokutek/mongo
Performance + Compression + Transactions
Tuesday, November 12, 13
https://github.com/Tokutek/mongohttps://github.com/Tokutek/mongohttps://github.com/Tokutek/mongo -
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
35/53
MongoDB Storage
18
4 5555
(1,ptr5) (4,ptr1),(12,ptr8)
(19,ptr7) (10000,ptr2)
The pointer tells MongoDB where to look in the heap for the requesteddocument (another IO)
35
85
40 120
(2,ptr5),(22,ptr6)
(50,ptr4) (100,ptr7) (222,ptr3)
PK index (_id + pointer) Secondary index (foo + pointer)
db.test.insert({foo:55})db.test.ensureIndex({foo:1}) memory mapped heap
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
36/53
TokuMX Storage
18
4 5555
(1,doc) (4,doc),(12,doc)
(19,doc) (10000,doc)
36
85
40 120
(2,4), (22,12) (50,19) (100,10000) (222,1)
PK index (_id + document) Secondary index (foo + _id)
db.test.insert({foo:55})db.test.ensureIndex({foo:1}) memory mapped heap
One less IO per _id lookup, document is clustered in the index
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
37/53
TokuMX Performance
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
38/53
Performance - Indexed Insertion
100mm inserts into a collection with 3 secondary indexes
38
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
39/53
Indexed Insertion : Multikey (100 inserts per doc)
39
Performance - Inserts on Indexed Arrays
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
40/53
Performance - Replication
TokuMX replication allows secondary servers to process
replication without IO
Simply injecting messages into the Fractal TreeIndexes on the secondary server
The Hard Work was done on the primary
o Uniqueness checkingo Transactional locking
o Update effort (read-before-write)
Elimination of replication lag
Your secondaries are fully available for read scaling! Wasnt that the point?
40
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
41/53
Performance - Lock Refinement
41
TokuMX performs locking at the document level Extreme concurrency!
instance
database database
collection collection collection collection
document
document
document
document
document
document document
document
document
document
MongoDB v2.2
MongoDB v2.0
TokuMX
Tuesday, November 12, 13
f f
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
42/53
42
Performance - Lock Refinement
Tuesday, November 12, 13
f k f d d
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
43/53
Sysbench benchmark (> RAM)
43
Performance - Lock Refinement + Reduced IO
Tuesday, November 12, 13
P f R d d IO
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
44/53
Indexed insertion benchmark
44
Performance - Reduced IO
Tuesday, November 12, 13
P f Cl t d I d
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
45/53
Performance - Clustered Indexes
Clustered secondary indexes
Additional copy of the document is stored in the index
No additional IO to get row data from primary key
Think better covered index (all non-indexed fields)
Good for point queries, great for range scans
Compression eliminates size concerns
45
Tuesday, November 12, 13
P f M M t
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
46/53
Performance - Memory Management
Two approaches to memory management
MongoDB = memory-mapped files
o Operating system determines what data isimportant
TokuMX = managed cache
o User defined size
o TokuMX determines what data is important
Run multiple TokuMX instances on a single server
Each has its own fixed cache size
46
Tuesday, November 12, 13
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
47/53
TokuMX Compression
Tuesday, November 12, 13
C i
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
48/53
Compression
MongoDB does not offer compression
Compressed file systems?
Shortened field names?
o Remember: each field name is stored in every single document
TokuMX easily achieves 5x-10x compression
Buy less disk or flash Compressed reads and writes reduce overall IO
TokuMX support 3 compression types
zlib, quicklz, lzma (size vs. speed)
all data is compressed Use descriptive field names!
They are easy to compress
48
Tuesday, November 12, 13
C i
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
49/53
Compression
31 million documents, bit torrent peer data
http://cs.brown.edu/~pavlo/torrent/
49
Tuesday, November 12, 13
http://cs.brown.edu/~pavlo/torrent/http://cs.brown.edu/~pavlo/torrent/ -
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
50/53
TokuMX Transactions
Tuesday, November 12, 13
ACID + MVCC
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
51/53
ACID + MVCC
ACID
In MongoDB, multi-insertion operations allow forpartial successo Asked to store 5 documents, 3 succeeded
We offer all or nothing behavior
Document level locking
MVCC
In MongoDB, queries can be interrupted by writers.o The effect of these writers are visible to the reader
TokuMX offers MVCCo Reads are consistent as of the operation start
51
Tuesday, November 12, 13
Multi statement Transactions
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
52/53
Multi-statement Transactions
TokuMX brings the following to MongoDB
db.runCommand({beginTransaction, isolation:mvcc})
...perform 1 or more operations
db.runCommand(rollbackTransaction) |db.runCommand(commitTransaction)
Not allowed in sharded environments
mongos will reject
52
Tuesday, November 12, 13
Questions?
-
8/12/2019 20131112 Pluk Fractal Trees Theory to Practice
53/53
Tim Callaghan
VP/Engineering, [email protected]
@tmcallaghan
Questions?
mailto:[email protected]:[email protected]:[email protected]:[email protected]