cloud messaging service: technical overview

Post on 15-Feb-2017

968 Views

Category:

Internet

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Cloud Messaging Service Technical Overview

P R E S E N T E D B Y M a t t e o M e r l i| S e p t e m b e r 2 1 , 2 0 1 5

Sections

2

1. Introduction 2. Architecture 3. Bookkeeper 4. Future 5. Q & A

CMS - Technical Overview

What is CMS

3

• Hosted Pub / Sub • Multi tenant (Auth / Quotas / Load Balancer) • Horizontally scalable • Highly available, durable and consistent storage • Geo Replication • In production since 2013

CMS - Technical Overview

CMS ClusterProducer

Broker

Consumer

Bookie

ZKGlobalZK

Replication

CMS key features

4 CMS - Technical Overview

• Multi-tenancy / hosted • Operating a system at scale is hard and requires deep understanding of internals • Authentication / Self service provisioning / Quotas

• SLAs (Write latency 2ms avg - 5ms 99pct) • Maintain the same latencies and throughput under backlog draining scenarios

• Simple high level API with clear ordering, durability and consistency semantics • Geo-replication

• Single API call to configure regions to replicate to • Load balancer: Dynamically optimize topics assignment to brokers • Support large number of topics • Store subscription position

• Apps don’t need to store it • Able to delete data as soon as it's consumed • Support round-robin distribution across multiple consumers

Work load examples

5 CMS - Technical Overview

Challenge # Topics # Producers / topic

# Subscriptions / topic

Produced msg rate / s / topic

Fan-out 1 1 1 K 1 KThroughput & latency 1 1 1 100 K

# Topics & latency 1 M 1 10 10

Fan-in 1 1 K 1 > 100 K

• Design to support wide range of use cases • Need to be cost effective in every case

2. Archi tecture

Messaging model

7 CMS - Technical Overview

• Producers can attach to a topic and send messages to it • A subscription is a durable resources that is the recipient of all messages sent to

the topic, after its creation • Subscriptions do have a type:

• “Exclusive” means that only one consumer is allowed to attach to this subscription. First consumer decides the type.

• “Shared” allows multiple consumers. Messages are sent in round-robin distribution. No ordering guarantees.

• “Failover” allows multiple consumers, though only one is receiving messages at a given point, while others are in standby mode.

Consumer-5

Failover

Subscription-C

Consumer-4

Consumer-3

Consumer-2

Subscription-B

Shared

Exclusive

Consumer-1Subscription-AProducer-X

Producer-Y

Topic

Client API

8

▪ Expose messaging model concepts (producer/consumer) ▪ C++ and Java ▪ Connection pooling ▪ Handle recoverable failures transparently (reconnect / resend

messages) without compromising ordering guarantees ▪ Sync / async version of every operation

CMS - Technical Overview

Java producer example

9

CmsClient client = CmsClient.create("http://<broker vip>:4080");

Producer producer = client.createProducer("my-topic");

// handles retries in case of failure producer.send("my-message".getBytes());

// Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted });

CMS - Technical Overview

Java consumer example

10

CmsClient client = CmsClient.create(“http://<broker vip>:4080");

Consumer consumer = client.subscribe( “my-topic", "my-subscription-name", SubscriptionType.Exclusive);

// Blocks until message available Message msg = consumer.receive(); // Do something... consumer.acknowledge(msg);

CMS - Technical Overview

System overview

11 CMS - Technical Overview

Broker • State-less • Maintain in memory cache of

messages • Read from Bookkeeper when

cache miss

Bookkeeper • Distributed write-ahead log • Create many ledgers

• Append entries • Read entries • Delete ledger

• Consistent reads • Single writer (the broker)

CMS Cluster

Broker

Bookie

ZKGlobal

ZK

Replication

Nativedispatcher

ManagedLedger

BK Client

Globalreplicators

Cache

LoadBalancer

Producer App

CMS client

Consumer App

CMS client

System overview

12 CMS - Technical Overview

Native dispatcher • Async Netty server

Global replicators • If topic is global, republish

messages in other regions

Global Zookeeper • ZK instance with participants in

multiple US regions • Consistent data store for

customers configuration • Accept writes with one region

down CMS Cluster

Broker

Bookie

ZKGlobal

ZK

Replication

Nativedispatcher

ManagedLedger

BK Client

Globalreplicators

Cache

LoadBalancer

Producer App

CMS client

Consumer App

CMS client

Partitioned topics

13

▪ Client lib has a wrapper producer/consumer implementation

▪ No API changes

▪ Producers can decide how to assign messages to partitions: ▪ Single partition ▪ Round robin ▪ Provide a key on the message ▪ Hash of the key determines the

partition

▪ Custom routing

CMS - Technical Overview

App

CMS Cluster

Broker 1

Prod

ucer

T1

P0

P1

P2

P3

P4

T1-P0

Broker 2

Broker 3

T1-P1

T1-P2

T1-P3

T1-P4

Partitioned topics

14

▪ Consumers can use all subscription type with the same semantics

▪ In “Failover” subscription type, the election is done per partition

▪ Evenly spread the partitions assignment across all available consumers

▪ No need for ZK coordination

CMS - Technical Overview

CMS Cluster

Broker 1 App

Con

sum

er-1

T1

C0

C1

C2

C3

C4

T1-P0

Broker 2

Broker 3

T1-P1

T1-P2

T1-P3

T1-P4

App

Con

sum

er-2

T1

C0

C1

C2

C3

C4

3. Bookkeeper

CMS Bookkeeper usage

16

▪ CMS uses Bookkeeper through a higher level interface of ManagedLedger: › A single managed ledger represent the storage of a single topic › Maintains list of currently active BK ledgers › Maintains the subscription positions using an additional ledger to checkpoint the last

acknowledged message in the stream › Cache data › Deletes ledgers when all cursors are done with them

CMS - Technical Overview

Bookie internal structure

17 CMS - Technical Overview

• Writes are written both to journal and to ledger storage (in different device)

• Ledger storage writes are fsynced periodically

• Reads are only coming from ledger storage

• Entries are interleaved in entry log files

• Ledger indexes are used to find entries offset

Bookkeeper issues

18

▪ Performance degrades when writing to many ledgers at the same time ▪ When there are heavy reads, the ledger storage device gets slow and

will impact writes ▪ Ledger storage flushes need to fsync many ledger index files each time

CMS - Technical Overview

Bookie storage improvements

19 CMS - Technical Overview

• Writes are written both to journal and to in memory write cache

• Entries are periodically flushed • Entries are sorted by ledger to

be sequential on disk (per flush period)

• Since entries are sequential, we added read-ahead cache

• Location index is mostly kept in memory and only updated during flush

Bookkeeper write latency

20

▪ After hardware, next limit to achieve low latency is JVM GC ▪ GC pauses are unavoidable. Try to keep them around ~50ms and as

least as frequents as possible › Switched BK client and servers to use Netty pooled ref-counted buffers and direct

memory to hide it from GC and eliminate payload copies › Extensively profiled allocations and substantially reduced per-entry objects allocations • Use Recycler pattern to pool objects (very efficient for same thread allocate/release) • Primitive collections • Array queue instead of linked queues in executors • Open hash maps instead of linked hash maps • BTree instead of ConcurrentSkipList

CMS - Technical Overview

Bookie ledgers scalability

21 CMS - Technical Overview

Single bookie — 15K write/sB

K w

rite

late

ncy

(ms)

0

1

2

3

4

Ledgers / bookie

1 1000 5000 10000 20000 50000

Avg 99pct

4. Future

Auto batching

23

▪ Send messages in batches throughout the system ▪ Transparent to application ▪ Configure group timing and size: e.g.: 1ms / 128Kb ▪ For the same byte/s throughput lower the txn/s through the system › Less CPU usage in broker/bookies › Lower GC pressure

CMS - Technical Overview

Low durability

24

▪ Current throughput bottleneck for bookie writes is journal syncs ▪ Could add more bookies but bigger cost ▪ Some use cases are ok to lose data in rare occasions ▪ Solution › Store data in bookies • No memory limitation, can build big backlog

› Don’t write to bookie journal • Data is stored in write cache in 2 bookies + broker cache

› Can lose < 1min data in case 1 broker & 2 bookies crash

▪ Higher throughput with less bookies ▪ Lower publish latency

CMS - Technical Overview

5. Q & A

top related