cloud messaging service: technical overview
Post on 15-Feb-2017
968 Views
Preview:
TRANSCRIPT
Cloud Messaging Service Technical Overview
P R E S E N T E D B Y M a t t e o M e r l i| S e p t e m b e r 2 1 , 2 0 1 5
Sections
2
1. Introduction 2. Architecture 3. Bookkeeper 4. Future 5. Q & A
CMS - Technical Overview
What is CMS
3
• Hosted Pub / Sub • Multi tenant (Auth / Quotas / Load Balancer) • Horizontally scalable • Highly available, durable and consistent storage • Geo Replication • In production since 2013
CMS - Technical Overview
CMS ClusterProducer
Broker
Consumer
Bookie
ZKGlobalZK
Replication
CMS key features
4 CMS - Technical Overview
• Multi-tenancy / hosted • Operating a system at scale is hard and requires deep understanding of internals • Authentication / Self service provisioning / Quotas
• SLAs (Write latency 2ms avg - 5ms 99pct) • Maintain the same latencies and throughput under backlog draining scenarios
• Simple high level API with clear ordering, durability and consistency semantics • Geo-replication
• Single API call to configure regions to replicate to • Load balancer: Dynamically optimize topics assignment to brokers • Support large number of topics • Store subscription position
• Apps don’t need to store it • Able to delete data as soon as it's consumed • Support round-robin distribution across multiple consumers
Work load examples
5 CMS - Technical Overview
Challenge # Topics # Producers / topic
# Subscriptions / topic
Produced msg rate / s / topic
Fan-out 1 1 1 K 1 KThroughput & latency 1 1 1 100 K
# Topics & latency 1 M 1 10 10
Fan-in 1 1 K 1 > 100 K
• Design to support wide range of use cases • Need to be cost effective in every case
2. Archi tecture
Messaging model
7 CMS - Technical Overview
• Producers can attach to a topic and send messages to it • A subscription is a durable resources that is the recipient of all messages sent to
the topic, after its creation • Subscriptions do have a type:
• “Exclusive” means that only one consumer is allowed to attach to this subscription. First consumer decides the type.
• “Shared” allows multiple consumers. Messages are sent in round-robin distribution. No ordering guarantees.
• “Failover” allows multiple consumers, though only one is receiving messages at a given point, while others are in standby mode.
Consumer-5
Failover
Subscription-C
Consumer-4
Consumer-3
Consumer-2
Subscription-B
Shared
Exclusive
Consumer-1Subscription-AProducer-X
Producer-Y
Topic
Client API
8
▪ Expose messaging model concepts (producer/consumer) ▪ C++ and Java ▪ Connection pooling ▪ Handle recoverable failures transparently (reconnect / resend
messages) without compromising ordering guarantees ▪ Sync / async version of every operation
CMS - Technical Overview
Java producer example
9
CmsClient client = CmsClient.create("http://<broker vip>:4080");
Producer producer = client.createProducer("my-topic");
// handles retries in case of failure producer.send("my-message".getBytes());
// Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted });
CMS - Technical Overview
Java consumer example
10
CmsClient client = CmsClient.create(“http://<broker vip>:4080");
Consumer consumer = client.subscribe( “my-topic", "my-subscription-name", SubscriptionType.Exclusive);
// Blocks until message available Message msg = consumer.receive(); // Do something... consumer.acknowledge(msg);
CMS - Technical Overview
System overview
11 CMS - Technical Overview
Broker • State-less • Maintain in memory cache of
messages • Read from Bookkeeper when
cache miss
Bookkeeper • Distributed write-ahead log • Create many ledgers
• Append entries • Read entries • Delete ledger
• Consistent reads • Single writer (the broker)
CMS Cluster
Broker
Bookie
ZKGlobal
ZK
Replication
Nativedispatcher
ManagedLedger
BK Client
Globalreplicators
Cache
LoadBalancer
Producer App
CMS client
Consumer App
CMS client
System overview
12 CMS - Technical Overview
Native dispatcher • Async Netty server
Global replicators • If topic is global, republish
messages in other regions
Global Zookeeper • ZK instance with participants in
multiple US regions • Consistent data store for
customers configuration • Accept writes with one region
down CMS Cluster
Broker
Bookie
ZKGlobal
ZK
Replication
Nativedispatcher
ManagedLedger
BK Client
Globalreplicators
Cache
LoadBalancer
Producer App
CMS client
Consumer App
CMS client
Partitioned topics
13
▪ Client lib has a wrapper producer/consumer implementation
▪ No API changes
▪ Producers can decide how to assign messages to partitions: ▪ Single partition ▪ Round robin ▪ Provide a key on the message ▪ Hash of the key determines the
partition
▪ Custom routing
CMS - Technical Overview
App
CMS Cluster
Broker 1
Prod
ucer
T1
P0
P1
P2
P3
P4
T1-P0
Broker 2
Broker 3
T1-P1
T1-P2
T1-P3
T1-P4
Partitioned topics
14
▪ Consumers can use all subscription type with the same semantics
▪ In “Failover” subscription type, the election is done per partition
▪ Evenly spread the partitions assignment across all available consumers
▪ No need for ZK coordination
CMS - Technical Overview
CMS Cluster
Broker 1 App
Con
sum
er-1
T1
C0
C1
C2
C3
C4
T1-P0
Broker 2
Broker 3
T1-P1
T1-P2
T1-P3
T1-P4
App
Con
sum
er-2
T1
C0
C1
C2
C3
C4
3. Bookkeeper
CMS Bookkeeper usage
16
▪ CMS uses Bookkeeper through a higher level interface of ManagedLedger: › A single managed ledger represent the storage of a single topic › Maintains list of currently active BK ledgers › Maintains the subscription positions using an additional ledger to checkpoint the last
acknowledged message in the stream › Cache data › Deletes ledgers when all cursors are done with them
CMS - Technical Overview
Bookie internal structure
17 CMS - Technical Overview
• Writes are written both to journal and to ledger storage (in different device)
• Ledger storage writes are fsynced periodically
• Reads are only coming from ledger storage
• Entries are interleaved in entry log files
• Ledger indexes are used to find entries offset
Bookkeeper issues
18
▪ Performance degrades when writing to many ledgers at the same time ▪ When there are heavy reads, the ledger storage device gets slow and
will impact writes ▪ Ledger storage flushes need to fsync many ledger index files each time
CMS - Technical Overview
Bookie storage improvements
19 CMS - Technical Overview
• Writes are written both to journal and to in memory write cache
• Entries are periodically flushed • Entries are sorted by ledger to
be sequential on disk (per flush period)
• Since entries are sequential, we added read-ahead cache
• Location index is mostly kept in memory and only updated during flush
Bookkeeper write latency
20
▪ After hardware, next limit to achieve low latency is JVM GC ▪ GC pauses are unavoidable. Try to keep them around ~50ms and as
least as frequents as possible › Switched BK client and servers to use Netty pooled ref-counted buffers and direct
memory to hide it from GC and eliminate payload copies › Extensively profiled allocations and substantially reduced per-entry objects allocations • Use Recycler pattern to pool objects (very efficient for same thread allocate/release) • Primitive collections • Array queue instead of linked queues in executors • Open hash maps instead of linked hash maps • BTree instead of ConcurrentSkipList
CMS - Technical Overview
Bookie ledgers scalability
21 CMS - Technical Overview
Single bookie — 15K write/sB
K w
rite
late
ncy
(ms)
0
1
2
3
4
Ledgers / bookie
1 1000 5000 10000 20000 50000
Avg 99pct
4. Future
Auto batching
23
▪ Send messages in batches throughout the system ▪ Transparent to application ▪ Configure group timing and size: e.g.: 1ms / 128Kb ▪ For the same byte/s throughput lower the txn/s through the system › Less CPU usage in broker/bookies › Lower GC pressure
CMS - Technical Overview
Low durability
24
▪ Current throughput bottleneck for bookie writes is journal syncs ▪ Could add more bookies but bigger cost ▪ Some use cases are ok to lose data in rare occasions ▪ Solution › Store data in bookies • No memory limitation, can build big backlog
› Don’t write to bookie journal • Data is stored in write cache in 2 bookies + broker cache
› Can lose < 1min data in case 1 broker & 2 bookies crash
▪ Higher throughput with less bookies ▪ Lower publish latency
CMS - Technical Overview
5. Q & A
top related