Building Distributed
Applications with Apache
Zookeeper
Alex Ehrnschwender | Game Server Engineer at DeNA
What is Zookeeper?
“ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.”
Zookeeper Wiki
ZooKeeper: A Coordination Service for Distributed Applications
Coordination & synchronization for
distributed processes
Logical namespacing implemented by a
hierarchy (tree) of znodes
Replicated in-memory over multiple hosts
for reliability, availability, and performance
Simple API of CRUD & basic tree operations
for client integration
Zookeeper: Reliability & Consistency
Distributed ensemble with automatic leader
election through quorum
Replicated in-memory on every instance with
snapshot writes to disk
Client TCP connection maintained to any
node with failover support
Guaranteed atomicity & sequential
consistency
Zookeeper: Watches & Ephemeral nodes
Underlying znodes have a data structure consisting of version numbers (cversion, aversion) &
timestamps
Watches
● Client-initiated subscriptions to znodes
● Changes to a watched znode trigger notification to subscribed clients
Ephemeral Nodes
● Backed by a client session and deleted when client session ends
● Cannot have children
Zookeeper: But… why?
“Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.”
Zookeeper Wiki
Zookeeper: Advantages for Backing a Server Cluster
Server workers can become cluster-aware
So much out-of-the-box that would be duplicated with a custom solution
Extremely fast reads (10:1 performance against writes)
Small footprint - An ensemble of only 5-7 zk instances can serve the
coordination needs of several large production applications
Centralized event broadcasting & failure detection (heartbeat)
Zookeeper: Common Use Cases
● Configuration Management
● Service Discovery
● Distributed Cloud-Based File Systems
● Internal DNS Management
● Master (Leader) Election and Voting
● Messaging Queue
● Event Broadcasting & Notification
Use Case Example #1 - Managing Redis Shards
ZK Use Case Example #1 - Pinterest
Pinterest stores their entire follower model inside sharded Redis instances (
~9000 Redis shards, multiple instances per core)
Shard configuration is stored and managed by Zookeeper
Client lookups and watches for shard location & subsequent data retrieval
Master-slave failover triggers updates to znode representation (slave address replaces master)
Vertical splitting of data broadcasted to watching clients
Use Case Example #2 - HBase Cluster Configuration
Code Examples
public void join(String groupName, String memberName)
throws KeeperException, InterruptedException {
String path = "/" + groupName + "/" + memberName;
String createdPath = zk.create(path,
null /* data */,
ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL);
System.out.println("Created " + createdPath);
}
public void create(String groupName)
throws KeeperException, InterruptedException {
String path = "/" + groupName;
String createdPath = zk.create(path,
null /* data */,
ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.PERSISTENT);
System.out.println("Created " + createdPath);
}
Code Examples (cont.)
public void delete(String groupName)
throws KeeperException, InterruptedException {
String path = "/" + groupName;
try {
List<String> children = zk.getChildren(path, false);
for(String child : children) {
zk.delete(path + "/" + child, -1); /* child */
}
zk.delete(path, -1); /* parent */
} catch (KeeperException.NoNodeException e) {
System.out.printf("Group %s does not exist\n", groupName);
}
}
public void list(String groupName)
throws KeeperException, InterruptedException {
String path = "/" + groupName;
try {
List<String> children = zk.getChildren(path, false);
for(String child : children) {
System.out.println(child);
}
} catch (KeeperException.NoNodeException e) {
System.out.printf("Group %s does not exist\n",
groupName);
}
}
Performance
Standalone ops/sec 3-Node Ensemble (ops/sec)
Reference:
https://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
Sample Configuration (zoo.cfg)
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
Exhibitor: A ZK Monitoring & Administration Tool from Netflix
Centralization & externalization of zk ensemble configuration* (S3/remote FS)
Web UI & REST API for ease of management
Instance monitoring with automatic configuration updates
Rolling ensemble changes while maintaining quorum
Miscellaneous administration tasks (backup/restore, log & snapshot cleanup)
* Configuration management for a configuration manager.... so meta!
Questions?
Appendix
Zookeeper Atomic Broadcast (ZAB) Algorithm
● Protocol for managing atomic updates to replicas
● Responsible for:
o Agreeing on an ensemble leader
o Synchronizing replicas
o Managing transactions and broadcasts
o Recovery of state
● ZXIDs & transactional ordering
● Guarantees:
o Local & global primary order
o Primary integrity
Performance
Performance
Standalone ops/sec 3-Node Ensemble (ops/sec)
Reference:
https://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
Sample Configuration (zoo.cfg)
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
References
● http://engineering.pinterest.com/post/55272557617/building-a-follower-model-from-scratch
● http://zookeeper.apache.org/doc/trunk/zookeeperOver.html
● http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html
● https://github.com/Netflix/exhibitor/wiki
● http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf
● http://web.stanford.edu/class/cs347/reading/zab.pdf
● http://highscalability.com/blog/2008/7/15/zookeeper-a-reliable-scalable-distributed-coordination-
syste.html
● https://wiki.apache.org/solr/SolrCloud
● http://www.slideshare.net/scottleber/apache-zookeeper