cosc 6339 big data analytics nosql database systems(iii) key … · 2018. 11. 1. · 3 usage...

17
1 COSC 6339 Big Data Analytics NoSQL Database Systems(III) – Key-value stores: Redis and Memcached Edgar Gabriel Fall 2018 Redis In-memory key/value Support for various types: Lists sets and sorted sets hash tables append-able buffers Open source, sponsored by VMWare Used in the real world: github, craigslist, engineyard, ... Used heavily as a front-end database Slide based on tutorial by Dvir Volk https://www.slideshare.net/dvirsky/introduction-to-redis

Upload: others

Post on 31-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    COSC 6339

    Big Data Analytics

    NoSQL Database Systems(III) –

    Key-value stores: Redis and Memcached

    Edgar Gabriel

    Fall 2018

    Redis

    • In-memory key/value

    • Support for various types:

    – Lists

    – sets and sorted sets

    – hash tables

    – append-able buffers

    • Open source, sponsored by VMWare

    • Used in the real world: github, craigslist, engineyard, ...

    • Used heavily as a front-end database

    Slide based on tutorial by Dvir Volk https://www.slideshare.net/dvirsky/introduction-to-redis

  • 2

    Redis features

    • All data is in memory

    • All data is eventually persistent (But can be

    immediately)

    • Handles huge workloads easily

    • Mostly O(1) behavior

    • Ideal for write-heavy workloads

    • Supports for transactions

    • Has publish / subscribe functionality

    • Client libraries available for all major languages

    Slide based on tutorial by Dvir Volk https://www.slideshare.net/dvirsky/introduction-to-redis

    • Master-slave replication out of the box

    • Slaves can be made masters on the fly

    • Supports cluster mode (sharding)

    • If your database is too big - redis can handle swapping

    on its own.

    – Keys remain in memory and least used values are

    swapped to disk.

    – Swapping I/O happens in separate threads

    Redis features (II)

    Slide based on tutorial by Dvir Volk https://www.slideshare.net/dvirsky/introduction-to-redis

  • 3

    Usage examples

    • Get/Sets - nothing fancy. Keys are strings, anything goes - just

    quote spaces.

    redis> SET foo "bar"

    OK

    redis> GET foo

    "bar"

    • You can atomically increment numbers

    redis> SET bar 337

    OK

    redis> INCRBY bar 1000

    (integer) 1337

    • Getting multiple values at once

    redis> MGET foo bar

    1. "bar"

    2. "1337"Slide based on tutorial by Dvir Volk https://www.slideshare.net/dvirsky/introduction-to-redis

    Usage examples

    • Keys are lazily expired

    redis> EXPIRE foo 1

    (integer) 1

    redis> GET foo

    (nil)

    • Note: re-setting a value without re-expiring it will remove the

    expiration

    • GETSET puts a different value inside a key, retrieving the old one

    redis> SET foo bar

    OK

    redis> GETSET foo baz

    "bar"

    Slide based on tutorial by Dvir Volk https://www.slideshare.net/dvirsky/introduction-to-redis

  • 4

    Usage examples

    • SETNX sets a value only if it does not exist

    redis> SETNX foo bar

    *OK*

    redis> SETNX foo baz

    *FAILS*

    • SETNX + Timestamp => Named Locks!

    redis> SETNX myLock

    OK

    redis> SETNX myLock

    *FAILS*

    Slide based on tutorial by Dvir Volk https://www.slideshare.net/dvirsky/introduction-to-redis

    List operations

    • Lists are ordinary linked lists: supports push, pop, extract range,

    resize, etc.

    • Random access and ranges at O(N)!

    redis> LPUSH foo bar

    (integer) 1

    redis> LPUSH foo baz

    (integer) 2

    redis> LRANGE foo 0 2

    1. "baz"

    2. "bar"

    redis> LPOP foo

    "baz"

    Slide based on tutorial by Dvir Volk https://www.slideshare.net/dvirsky/introduction-to-redis

  • 5

    Sets example

    • Sets are unique values

    – can be intersected/diffed /union'ed server side.

    – useful as keys when building complex schema.

    redis> SADD foo bar

    (integer) 1

    redis> SADD foo baz

    (integer) 1

    redis> SMEMBERS foo

    ["baz", "bar"]

    redis> SADD foo2 baz // SADD foo2 raz

    (integer) 1

    Slide based on tutorial by Dvir Volk https://www.slideshare.net/dvirsky/introduction-to-redis

    Publish/Subscribe interface

    • Clients can subscribe to channels or patterns and receive

    • notifications when messages are sent to channels.

    • Subscribing is O(1), posting messages is O(n)

    • Think chats, Comet applications: real-time analytics, twitter

    redis> subscribe feed:joe feed:moe feed:boe

    //now we wait

    ....

    1. "message"

    2. "feed:joe"

    3. "all your base are belong to me"

    redis> publish feed:joe "all your base are belong to me"

    Slide based on tutorial by Dvir Volk https://www.slideshare.net/dvirsky/introduction-to-redis

  • 6

    Transactions

    • Sequence of MULTI, ...., EXEC instructions: All commands are

    executed after EXEC, block and return values for the commands as

    a list.

    • Example:

    redis> MULTI

    OK

    redis> SET "foo" "bar"

    QUEUED

    redis> INCRBY "num" 1

    QUEUED

    redis> EXEC

    1) OK

    2) (integer) 1

    • Transactions can be discarded with DISCARD.

    Slide based on tutorial by Dvir Volk https://www.slideshare.net/dvirsky/introduction-to-redis

    Redis cluster mode

    • All nodes are directly connected with a service channel

    • Node to Node protocol is binary, optimized for

    bandwidth and speed.

    • Clients talk to nodes using asci protocol, with minor

    additions.

    • Nodes don't proxy queries.

    • Keyspace is divided into 4096 hash slots

    • Different nodes will hold a subset of hash slots.

    Slide based on https://redis.io/presentation/Redis_Cluster.pdf

  • 7

    Redis cluster mode

    • Nodes are all connected

    and functionally

    equivalent

    • There are two kind of

    nodes: slave and master

    nodes

    • the cluster will continue

    to work as long as there

    is at least one node for

    every hash slot.

    Slide based on https://redis.io/presentation/Redis_Cluster.pdf

    Redis cluster mode: client requests

    • Dummy, single-connection clients, will work with

    minimal modifications to existing client code base. Just

    try a random node among a list, then reissue the query

    if needed.

    • Smart clients will take persistent connections to many

    nodes, will cache hashslot -> node info, and will update

    the table when they receive a -MOVED error.

    • Schema scales horizontally

    • Low latency if the clients are smart

    Slide based on https://redis.io/presentation/Redis_Cluster.pdf

  • 8

    Redis cluster mode fault-tolerance

    • All nodes continuously ping other nodes...

    • A node marks another node as possibly failing when

    there is a timeout longer than N seconds.

    • Every PING and PONG packet contain a gossip section:

    information about other nodes idle times, from the

    point of view of the sending node.

    Slide based on https://redis.io/presentation/Redis_Cluster.pdf

    Redis cluster mode fault-tolerance

    • A guesses B is failing, as the latest PING request timed

    out. A will not take any action without any other hint.

    • C sends a PONG to A, with the gossip section containing

    information about B: C also thinks B is failing

    • At this point A marks B as failed, and notifies all other

    nodes in the cluster

    • All other nodes will also mark B as failing

    • If B returns, the first time he'll ping any node of the

    cluster, it will be notified to shut down

    Slide based on https://redis.io/presentation/Redis_Cluster.pdf

  • 9

    Redis-trib – the redis cluster manager

    • It is used to setup a new cluster, once you start N blank

    nodes.

    • it is used to check if the cluster is consistent. And to fix

    it if the cluster can't continue, as there are hash slots

    without a single node.

    • It is used to add new nodes to the cluster, either as

    slaves of an already existing master node, or as blank

    nodes where we can re-shard a few hash slots to lower

    other nodes load.

    Slide based on https://redis.io/presentation/Redis_Cluster.pdf

    memcached

    • High-performance, distributed memory object caching

    system

    • Key-based cache daemon that stores data and objects

    in main memory for very quick access

    • Based on a distributed hash table

    • Doesn’t provide redundancy, failover or authentication

    • In Use by Facebook, twitter, …

    • Open Source

  • 10

    Use cases

    • Anything what is more expensive to fetch from

    elsewhere, and has sufficient hit rate, can be

    placed in memcached

    – How often will object or data be used?

    – How expensive is it to generate the data?

    – What is the expected hitrate?

    – Will the application invalidate the data itself, or will TTL

    be used?

    – How much development work has to be done to embed it?

    Limitations

    • Memcache is held in RAM => finite resource

    • If the system can respond within the requirements

    without it - leave it alone

    • Keys can be no more then 250 characters

    • Stored data can not exceed 1M (largest typical slab

    size)

    • There are generally no limits to the number of nodes

    running memcached

    • There are generally no limits to the amount of RAM

    used by memcached over all nodes

  • 11

    Starting memcached

    • Memcached can be run as a non-root user if it will not be on a restricted port ( memcached

    • Default configuration - Memory: 64MB, all network interfaces, max simultaneous connections: 1024

    • Changing default configuration options:

    -u : run as user if started as root

    -m : maximum MB memory to use for items

    -d : Run as a daemon

    -c : max simultaneous connections

    -v : verbose

    -t : number of threads to use to process incoming

    requests.

    Memcached commands

    • Storage - ask server to store data identified by a key

    – set, add, replace, append, prepend and cas

    • Retrieval - ask server to retrieve data corresponding to

    a set of keys

    – get, gets

    • Others operations that don’t involve unstructured data

    – Deletion: delete

    – Increment/Decrement: incr, decr

    – Statistics: stats,

    – flush_all: always succeeds, invalidate all existing items

    immediately (by default) or after the expiration

    specified.

  • 12

    Example

    Since memcached is a hash table we need to store

    things and retrieve things using keys and values.

    As the sole user of your own data stored on a

    memcached daemon you might store things with keys:

    >>> import memcache

    >>> mc = memcache.Client(['137.140.8.101:11211'])

    >>> mc.set('user:19','{name: "Lancelot",quest: "Grail"}') # set(key, picklable value)

    True

    >>> mc.get('user:19')

    '{name: "Lancelot", quest: "Grail"}'

    >>> mc.get('user:20')

    >>>

    {'userID:1','userID:2','userID:3', ... }

    memcached keys

    Key can be a string or a tuple of (hash_value, string) .

    More extensive spec of set():

    key: key

    value: value

    time: Optional expiration time, either relative number

    of seconds from current time (up to 1 month), or an

    absolute Unix epoch time.

    min_ compress_len: ignored in Python

    namespace: key “namespace”.

    set(key, value, time=0, min_compress_len=0,

    namespace=None)

  • 13

    Basic Idea Typical usecase for memcached is to check if you have

    previously determined and cached the output of a

    potentially expensive operation.

    If so, recover the previously determined output

    If not, do the calculation and save the output in

    memcached.#!/usr/bin/env python

    import memcache, random, time, timeit

    mc = memcache.Client(['127.0.0.1:11211'])

    def compute_square(n):

    value = mc.get('sq:%d' % n)

    if value is None:

    value = n * n

    mc.set('sq:%d' % n, value)

    return value

    What Data can you Cache?

    Low-level results like answers to queries to a database,

    keyed on the query.

    Higher-level constructs like entire web pages including

    dynamic data.

    Data might need to be purged and re-cached.

    keys

    – have to be unique

    – typically include user identifier + item distinguisher

    – can't exceed 250 bytes

    – can use a hash function on your desired key to produce a

    shorter one.

    values can be up to 1MB

  • 14

    Memcached is a cache…

    So how to replace an old value

    – Set an expiration time in the set() function.

    – Invalidate items already cached

    – Replace old values with new values for the same key.

    – Values need to be serialized so pickling them does quite

    nicely. Values that are already strings can probably be

    cached as-is.

    Memcached sharding

    • Memcached does not have a built in notion of sharding

    • One can execute multiple memcached servers on a

    cluster, sharding implemented on the client side

    • Memcached servers

    – has no internal hash table

    – are dumb and don’t know about each other

    – Use a slab allocator

    • Servers are not coordinated => no protocol overhead or

    consistency problems

    • Least recently accessed items are cycled out

    • One LRU per ‘slab class’

  • 15

    Memcached clients

    • Client hashes key to server list

    – Patterns in the characters that make up key values can

    lead to uneven distribution of memcached entries across

    various servers.

    – Clients have own sharding algorithm so you don't really

    need to worry about this. All clients implement the same

    stable algorithm to turn a key into an integer, n, that

    selects one of the memcached servers.

    • Serializes objects

    • Provides authentication

    Memcached v/s Redis

    • Testing for responsiveness

    • Varying number of clients

    • Values to be retrieved are of size 1 KB each

    0

    25

    50

    75

    100

    1 2 4 8 16 32 64

    Tim

    e t

    aken t

    o s

    tore

    data

    (s

    ec)

    No. of clients

    Min. time taken to store 100,000 key-value pairs with values of size 1 KB

    Memcached Redis

    0

    10

    20

    30

    1 2 4 8 16 32 64

    Tim

    e t

    aken t

    o r

    etr

    ieve d

    ata

    (s

    ec)

    No. of clients

    Min. time taken to retrieve 100,000 key-value pairs with values of size 1 KB

    Memcached Redis

  • 16

    Memcached v/s Redis

    • Testing for responsiveness

    • Varying number of clients retrieving the data

    • Values to be retrieved are of size 32 KB each

    0

    100

    200

    300

    1 2 4 8 16 32 64

    Tim

    e t

    aken t

    o s

    tore

    data

    (s

    ec)

    No. of clients

    Min. time taken to store 100,000 key-value pairs with values of size 32 KB

    Memcached Redis

    0

    50

    100

    1 2 4 8 16 32 64

    Tim

    e t

    aken t

    o r

    etr

    ieve d

    ata

    (s

    ec)

    No. of clients

    Min. time taken to retrieve 100,000 key-value pairs with values of size 32 KB

    Memcached Redis

    Memcached v/s Redis

    • Testing for server side scalability

    • Varying number of servers

    0

    1

    2

    3

    4

    1 2 4 8 12 16

    Tim

    e t

    aken t

    o s

    tore

    data

    (s

    ec)

    No. of servers

    Min. time taken to store 100,000 key-value pairs

    Memcached Redis

    0

    1

    2

    3

    1 2 4 8 12 16

    Tim

    e t

    aken t

    o r

    etr

    ieve d

    ata

    (s

    ec)

    No. of servers

    Min. time taken to retrieve 100,000 key-value pairs

    Memcached Redis

  • 17

    Memcached v/s Redis

    • Testing for functionality irrespective of data load

    • Varying the size of individual values

    0255075

    100

    1 2 4 8

    16

    32

    64

    126

    256

    512

    1024

    2048

    4096

    8192

    16384

    32768

    65536

    Tim

    e t

    aken t

    o s

    tore

    data

    (s

    ec)

    Value size (bytes)

    Min. time taken to store 100,000 key-value pairs

    Memcached Redis

    0255075

    100

    1 2 4 8

    16

    32

    64

    126

    256

    512

    1024

    2048

    4096

    8192

    16384

    32768

    65536

    Tim

    e t

    aken t

    o r

    etr

    ieve d

    ata

    (s

    ec)

    Value size (bytes)

    Min. time taken to retrieve 100,000 key-value pairs

    Memcached Redis