aerospike architecture

35
© 2014 Aerospike. All rights reserved. Confidential 1 Aerospike aer . o . spike [air-oh- spahyk] noun, 1. tip of a rocket that enhances speed and stability DEVELOPING WITH AEROSPIKE ARCHITECTURE OVERVIEW IN-MEMORY + NOSQL + ACID

Upload: peter-milne

Post on 15-Jul-2015

224 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 1

Aerospike aer . o . spike [air-oh- spahyk]

noun, 1. tip of a rocket that enhances speed and stability

DEVELOPING WITH AEROSPIKE

ARCHITECTURE OVERVIEW

IN-MEMORY + NOSQL + ACID

Page 2: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 2

Objectives

This module provides an overview of the Aerospike architecture. At the end

of this module you will have a high level understanding of

■ Client

■ Cluster

■ Storage

■Primary & Secondary indexes

■RAM

■Flash

■ Cross Datacenter Replication (XDR)

Page 3: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 3

The Big Picture

Page 4: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 4

Aerospike Goals

Aerospike technology goals are to meet these challenges:

■ Handle extremely high rates of read/write transactions over persistent

data

■ Avoid hot spots to maintain tight latency SLAs

■ Provide immediate consistency with replication

■ Ensure long running tasks do not slow down transactions

■ Scale linearly as data sizes and workloads increase

■ Add capacity with no service interruption

Page 5: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 5

Architecture – The Big Picture

1) No Hotspots

– DHT simplifies data

partitioning

2) Smart Client – 1 hop to data,

no load balancers

3) Shared Nothing Architecture,

every node identical

7) XDR – sync replication across

data centers ensures Zero Downtime

4) Single row ACID

– synch replication in cluster

5) Smart Cluster, Zero Touch

– auto-failover, rebalancing,

rack aware, rolling upgrades..

6) Transactions and long running

tasks prioritized real-time

Page 6: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 6

What Aerospike offers

■ Row oriented

■Key value store

■ Fast

■Like Redis and Memcache

■Whole key space

■ Complex types

■Like Redis and MongoDB

■List/Map/JSON

■Large Data types

■ Queries and Aggregations

■Secondary index

■Sub-second MapRedude

■ High performance

■Linux Daemon

■Multi-core aware

■Multi-socket aware

■No Garbage collection issues

■ Run anywhere

■Cloud friendly

■TCP networking

■ Flash optimized

■Near-DRAM Flash performance

Page 7: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 7

Client

Page 8: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 8

Smart Client™

■ The Aerospike Client is

implemented as a library, JAR or

DLL, and consists of 2 parts:

■Operation APIs – These are the

operations that you can execute on the

cluster – CRUD+ etc.

■First class observer of the Cluster –

Monitoring the state of each node and

aware on new nodes or node failures.

Page 9: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 9

From Key to Node

■ Distributed Hash Table with No Hotspots

■Every key hashed with RIPEMD160

into an ultra efficient 20 byte (fixed length) string

■Hash + additional (fixed 64 bytes) data

forms index entry in RAM

■Some bits from hash value are used to

calculate the Partition ID (4096 partitions)

■Partition ID maps to Node ID in the cluster

■ 1 Hop to data

■Smart Client simply calculates Partition

ID to determine Node ID

■No Load Balancers required

Page 10: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 10

Cluster

Page 11: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 11

The Cluster (servers)

■ Local servers

■XDR to remote cluster

■ Automatic Load balancing

■ Quick fail over

■ Detects new nodes (multicast)

■ Rebalances data (measured rate)

■ Add nodes under load

■ Rack awareness

■ “proxy” to correct node

■ Locally attached storage

Page 12: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 12

Data Distribution

Data is distributed evenly across nodes in a cluster using the Aerospike

Smart Partitions™ algorithm.

■ RIPEMD160 (no collisions yet found)

■ 4096 Data Partitions

■ Even distribution of

■Partitions across nodes

■Records across Partitions

■Data across Flash devices

■ Primary and Replica

Partitions

Page 13: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 13

Single Row ACID

Writing with Immediate Consistency

1. Write sent to record master

2. Latch against simultaneous writes

3. Apply write synchronously to master memory and replica memory

4. Queue operations to storage

5. Signal completed transaction

(optional storage commit wait)

6. Master applies conflict resolution policy

(rollback/ roll forward)

Page 14: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 14

Automatic rebalancing

Adding, or Removing a node, the Cluster automatically

rebalances

1. Cluster discovers new node via gossip protocol

2. Paxos vote determines new data organization

3. Partition migrations scheduled

4. When a partition migration starts,

write journal starts on destination

5. Partition moves atomically

6. Journal is applied and source data deleted

After migration is complete, the Cluster is evenly

balanced.

Page 15: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 15

Cluster formation

…and the Cluster forms

Individual nodes go in…

Gossip protocol

Heartbeat

PAXOS

Page 16: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 16

Aerospike Management and Monitoring

Aerospike provides a sophisticated management console: Aerospike

Management Console (AMC).

Plugins also are available for:

■ Graphite

■ Nagios

APIs for monitoring

■ Roll you own tool

■ 100’s of monitoring parameters

■ Fine grained latency monitoring

Page 17: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 17

Primary Index

Primary index

■Hash of Hash of rb trees

■ Index

■64 bytes

■Write generation

■Time To Live

■Storage address

■Uses shared memory for

Fast Restart

■ Single bin

■Optimization for minimal data volume

■ If using integers, store data in index

rather than in storage

Page 18: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 18

Primary Index – Single Bin

Storing a key-value entry where the value is a single integer can be very

useful. A namespace can be configured, and optimized, for a single Bin

record.

■ No Bin management overhead

■ Integer – stored in Index for free (data-in-index)

Page 19: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 19

Secondary Indexes

■ Bin (Column) indexes

■ Low selectivity = 100’s of rows

■ DDL – which Bins to index

■String or Range Integer

■ In RAM – fast

■ Multi-node

■Collocated with primary index

■ Reference local data only

■ Index creation

■Tools: AQL, ascli

■Client API – developer only

Page 20: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 20

User Defined Functions

User Defined Functions (UDFs) are an extensibility mechanism.

Adding a function that can be evaluated in the Cluster, local to the data.

■ UDFs are common in many databases:

■MySQL, SQL server, Oracle, DB2, Redis, Postgress, and others

■ UDFs move the compute close to the data

■ Lua (and C) today

Page 21: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 21

Storage

Page 22: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 22

Data Storage Layer

Page 23: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 23

Data on Flash / SSD

■ Indexes in RAM (64 bytes per)

■Low wear

■ Data in Flash (SSD)

■Record data stored contiguously

■ 1 read per record (multithreaded)

■Automatic continuous defragment, eviction

■Log structured file system, “copy on write”

■O_DIRECT, O_SYNC

■Data written in flash optimal large blocks

■Automatic distribution (no RAID)

■Writes cached

BLOCK INTERFACE

SSD SSDSSD

AEROSPIKE

HYBRID MEMORY SYSTEM™

Page 24: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 24

Data in RAM

Data in RAM is very fast – at a price

■ Indexes and Data

■ $$$ (great < 100G, Cloud)

■ More servers

■ Super fast

■ Optional HDD as backing store

Every cluster should have a RAM namespace

■ High frequency and low latency keys (hot keys)

Page 25: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 26

Reading a record (select)

The entire record is read from storage into the server RAM, only the

requested Bins are returned to the client.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 26

Page 26: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 27

Writing a new record (insert)

Entries are added into the primary and any secondary indexes. The record

is placed in a write buffer to be written to the next available block.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 27

Page 27: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 28

Updating a record (update)

The entire record is read in to server RAM, updated as needed, then written

to a new block (copy on write).

The index(es) point to the new location.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 28

Page 28: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 29

Deleting a record (delete)

Deleting a record removes the entries

from the indexes only. Very fast

The background defragmentation will

physically delete the record at a future

time.

Page 29: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 30

Cross Datacenter Replication – XDR

Page 30: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 31

XDR Architecture

Each node in the clusterDistributed clusters

Page 31: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 32

XDR Topologies

Simple Active-Passive Simple Active-Active

Star Replication More Complex Topology

Page 32: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 33

Failure Handling

Node failure within a cluster – nodes with replica data will continue

Link failure – XDR keeps track of link failures and data to be shipped over

that link. It will recover when the link comes up.

Node failure in a Cluster Link failure between Clusters

Page 33: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 34

Shipping

XDR can have fine control shipping

■ Namespaces

■ Sets

Compression

Page 34: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 35

Summary

You have learned about the Aerospike architecture

■ Client,

■ Cluster

■ Storage

■Primary & Secondary indexes

■RAM

■Flash

■ Cross Datacenter Replication (XDR)

Page 35: Aerospike Architecture

© 2014 Aerospike. All rights reserved. Confidential 36