tunable memory in couchbase server 3.0: couchbase connect 2014

35
Tunable Memory in Couchbase Server 3.0 Chiyoung Seo Software Architect, Couchbase Inc.

Upload: couchbase

Post on 20-Aug-2015

734 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

Tunable Memory in Couchbase Server 3.0

Chiyoung SeoSoftware Architect, Couchbase Inc.

Page 2: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 2

Data Manager in Couchbase Server

Database Bucket Architecture

NRU-Based Cache Management

Value-Only Ejection

Cache Management in Couchbase Server 3.0

Full Metadata Ejection

Performance Impact

Future Work for Performance Enhancements

Summary

Contents

Page 3: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

Database Bucket Architecture

Page 4: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 4

Couchbase Cluster

Cluster

Manager

--------------

Data

Manager

Clu

ster

Man

ager

--------------

Data

Man

ager

ClusterManager

--------------Data

Manager

Clu

ster

Man

ager

----

----

----

--D

ata

Man

ager

ClusterManager

------

------

--Data

Manager

Couchbase

Server

Couchbase

Server

Couchbase Server

Cou

chba

se

Ser

ver

Couchbase

Serve

r

Page 5: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 5

Couchbase Server Architecture

Hea

rtbe

at

Pro

cess

mon

itor

Glo

bal s

ingl

eton

sup

ervi

sor

Con

figur

atio

n m

anag

er

on each node

Reb

alan

ce o

rche

stra

tor

Nod

e he

alth

mon

itor

one per clusa

vBuc

ket

stat

e an

d re

plic

atio

n m

anag

er

http

RE

ST

man

ag

em

ent

AP

I/Web

UI

HTTP8091

Erlang port mapper4369

Distributed Erlang21100 - 21199

Erlang/OTP

storage interface

Couchbase EP Engine

11210Memcapable 2.0

Moxi

11211Memcapable 1.0

Memcached

Persistence Layer

8092Query API

Qu

ery

En

gin

e

Data Manager Cluster Manager

Page 6: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 6

Data Manager Architecture

storage interface

DatabaseBucket

11210

Memcached

Storage Engine

DatabaseBucket

DatabaseBucket…

Bucket Engine

Shared Thread Pool

Page 7: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 7

Database Bucket Architecture

Append-only B-Tree Storage Engine

Engine APIs(get, set, del, add, append, DCP, …)

PartitionHash Table

(active)

PartitionHash Table

(replica)

PartitionHash Table

(active)…

Checkpoints Checkpoints Checkpoints

ReaderThreads

Non-IO Threads

DataReplicator

I/O Completion

Notifier

Aux-IOThreads

FlushersData

Backfill

User Configured Replica Count = 1

Batch Readers

WriterThreads

Item Pager

Expiry Pager

Checkpoint Manager

Shared Thread Pool

Page 8: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 8

Hash buckets

Each hash bucket is maintained by a linked list of items

Engine parameter “ht_size” to configure the initial hash bucket size

Multiple locks to synchronize accesses to hash buckets

Engine parameter “ht_locks” to configure the number of locks

Hash buckets are dynamically resized by the daemon task “hash table resizer”

NON-IO thread runs the hash table resizer task periodically

Partition Hash Table

Page 9: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc.

Partition Hash Table

9

Key: “K1”Metadata: exp, cas, NRU, …Value: “V1”

Key: “K5”Metadata: exp, cas, NRU, …Value: “V5”

Key: “K100”Metadata: exp, cas, NRU, …Value: “V100” …

Key: “K50”

Metadata: exp, cas, NRU, …Value: “V50”

Key: “K3”Metadata: exp, cas, NRU, …Value: “V3”

Key: “70”Metadata: exp, cas, NRU, …Value: “V70” …

Key: “K200”Metadata: exp, cas, NRU, …Value: “V200”

Key: “K150”Metadata: exp, cas, NRU, …Value: “V150”

Key: “30”Metadata: exp, cas, NRU, …Value: “V30” …

Key: “K60”Metadata: exp, cas, NRU, …Value: “V60”

Key: “K20”Metadata: exp, cas, NRU, …Value: “V20”

Key: “130”Metadata: exp, cas, NRU, …Value: “V30” …

.

.

.

HashBucket 1

HashBucket 2

HashBucket n-1

HashBucket n

.

.

.

Lock 1

Lock 2

Lock m

.

.

.

Page 10: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

NRU-Based Cache Management

Page 11: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 11

Maintain two-bits long NRU score per item in hash table

NRU score sets to “2” for each new item and gets decremented by “1” for each READ access

Not Recently Used (NRU) Score

NRU Score Access Pattern

3 2 Accessed by READ

2 3 Incremented by Item Pager

2 Initial value for a new item

2 1 Accessed by READ

1 2 Incremented by Item Pager

1 0 Accessed by READ

0 1 Incremented by Item Pager

0 0 Accessed by READ

Page 12: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 12

Daemon task that is responsible for ejecting non-dirty items from hash table

Run if the database bucket memory usage goes beyond a high watermark and ejects items until the memory usage drops below the low watermark

Item Pager

Bucket memory configuration

Memory quota

Memory high watermark (85%)

Memory low watermark (75%)

Page 13: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 13

Phase I

1. Scan the next partition hash table and collect items with NRU score ‘3’

2. Eject items with NRU score ‘3’

3. Go to Step 1 if the memory usage is still above the low watermark

Phase II

1. Scan the next partition hash table and increment each item’s NRU score by ‘1’

2. If an item’s NRU score becomes ‘3’, then eject the item if N < P

N is a randomly generated number with a range [0, 1]

P is the probability that is based on the current memory usage, low watermark, and a partition state (active vs. replica)

3. Go to Step 1 if the memory usage is still above the low watermark

Item Pager

Page 14: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 14

Periodic daemon task scheduled once per day (10AM UTC by default)

1. Scans each partition hash table to gather the list of current resident items

2. Write {key, metadata} of those resident items into the access log file

Access log is used to restore the working set that were resident in memory before a node restart or crash

Access Log Generator

PartitionHash Table

(active)

PartitionHash Table

(replica)

PartitionHash Table

(active)…

PartitionHash Table

(active)

PartitionHash Table

(replica)

PartitionHash Table

(active)…

Access Log

Generator

Access Log

Generator

Shard 1 Shard n

Warm-upTask

Warm-upTask

Page 15: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

Cache Management: Value-Only Ejection

Page 16: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 16

Each hash table item consists of {key, metadata, value} Metadata memory overhead is 40 bytes at least

Hash Table Item

Key Metadata Blob pointer

Blob value

Expiration time

CAS identifier

Sequence number (DCP)

Revision number (XDCR)

Lock expiry (GetLocked API)

Flag, NRU, …

Hash Table Item

Pointer to next item

Page 17: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 17

Application’s entire key space is maintained in the hash table Highly cache-oriented architecture

Item pager ejects only an item’s value from the hash table

Value-Only Ejection

Key: “foo” Metadata Blob pointer

Blob valuePointer to next item

Hash Table ItemStorage Engine

Batch Reader

Get(“foo”)

read_value(“foo”)

Page 18: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 18

Pros Maximize the memory utilization High performance (latency, throughput)

Create, Read, Update, Delete operations Key existence check

Cons High memory overhead due to (key + metadata) of non-

resident items in cache Slow system warm-up time because all the keys and

their metadata values should be loaded at least

Value-Only Ejection

Page 19: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

Couchbase Server 3.0Full Metadata Ejection

Page 20: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 20

Application’s entire key space doesn’t need to be loaded in cache Reduce the memory overhead significantly in heavy DGM (Disk Greater than

Memory) cases

Item pager ejects an item’s key and metadata along with its value

Full Metadata Ejection

Key: “foo” Metadata Blob pointer

Blob valuePointer to next item

Hash Table ItemStorage Engine

Batch Reader

Get(“foo”)

read_meta_value(“foo”)

Page 21: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 21

Many of read / write APIs require an item’s metadata to be resident in cache

CAS (Compare and Set)

Add

Delete

Touch

GetMetaData

SetWithMeta

DeleteWithMeta

Implementation Impacts on APIs

Page 22: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 22

CAS (Compare and Set) API CAS operation needs to compare an item’s CAS identifier from

the client with the one in the server side Succeed only if those CAS identifiers are still the same

Implementation Impacts on APIs

Key: “foo”Metadata:

CAS id: 100Blob pointer

Blob value: “value2”

Pointer to next item

Hash Table Item{“foo”, 100, “value1”}

Storage Engine

Batch Reader

CAS(“foo”, 100, “value2”)

read_metadata(“foo”)

Page 23: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 23

Add API

Succeed only if an item is already expired or doesn’t exist

If an item is not resident in the full ejection mode, a disk lookup is required to figure out the item existence

Delete API

Succeed only if an item exists and is not deleted yet

Disk lookup is required for a non-resident item in the full ejection mode

Implementation Impacts on APIs

Page 24: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 24

Value-Only Ejection Mode

Deletion of Expired Items

Expiry Pager

Hash Table

Hash Table

Hash Table…

CheckpointQueue

Full Metadata Ejection Mode

Expiry Pager

Hash Table

Hash Table

Hash Table

CheckpointQueue

DB Compactor

Storage Engine

Page 25: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

Full Metadata Ejection: Performance Impact

Page 26: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 26

More disk I/O overhead for non-resident items

CAS (Compare and Set)

Add

Delete

Touch

GetMetaData

SetWithMeta

DeleteWithMeta

If an application’s active working set can’t be fitted into the bucket memory quota, it will experience a higher latency

Performance Impacts on APIs

Page 27: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 27

Performance Impacts on Warm-up

Value-Only Ejection Mode

Load all the keys and their metadata values

into memory

Full Metadata Ejection Mode

Read the access log file

Load the values of keys that are read from the

access log file

Load the access log?

Read the access log file

Load the values of keys that are read from the

access log file

Warm-up completed

YesNo

Full ejection mode provides much faster system warm-up

Page 28: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 28

Value-Only Ejection Active working set is fitted into the bucket memory quota

Active working set changes fast over time

Light to medium DGM (Disk Greater than Memory) cases (resident ratio >= 20%)

High performance is more crucial

Full Metadata Ejection Active working set is not fitted into the bucket memory quota

Active working set changes slowly over time

Heavy DGM cases (resident ratio <= 10%) with a huge data set

Application doesn’t require high performance comparable to the value-ejection mode

Value-Only vs. Full Metadata Ejection

Page 29: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

Future Work for Performance Enhancements

Page 30: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 30

Some APIs can be easily extended to support the better working set management or an asynchronous option to unblock the client

Async Add

Async Delete

Async Get

Get_Cached

SetWithoutCaching

New APIs

Page 31: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 32

Probabilistic data structure that can tell us if an item is a member of a set

A false positive is possible, but not false negative

Increasing the filter size reduces a false positive ratio at the expense of additional memory overhead

Various hash algorithms can be used

MurmurHash

CityHash

Jenkins Hash

Reduce the disk I/O lookup overhead for non-existent items

Bloom Filter

Page 32: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 33

Integrating Bloom Filter with Couchbase Server

Item Pager

Storage Engine

CompactorResizing the bloom filter during the compaction

Hash Table

Hash Table

Hash Table…

Bloom Filter per partition

Non-resident items

Page 33: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

Summary

Page 34: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

©2014 Couchbase, Inc. 35

More flexible cache management is necessary

Demanding on heavy DGM requirements in big data applications

Full metadata ejection

Large data set support without significant memory overhead

Performance impacts – not comparable to value-only ejection

Plan to improve the performance

API extensions

Bloom filter integration

New storage engine

Summary

Page 35: Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

Questions?

[email protected]