introduc)on*to* zookeeper - o'reilly mediaassets.en.oreilly.com/1/event/61/manage...
TRANSCRIPT
Introduc)on to Zookeeper
Copyright 2011 Cloudera Inc. All rights reserved
Today’s speaker – Tom Hanlon
• [email protected]• Senior Instructor at Cloudera
2
Copyright 2011 Cloudera Inc. All rights reserved
How I became aware of Zookeeper, and why that ma<ers to you
• HBASE• Uses Zookeeper to manage HA
Copyright 2011 Cloudera Inc. All rights reserved
What’s HBASE
• Hbase is a massively scalable distributed data store, easily scales across many many nodes
• Uses Zookeeper to manage failures and to provide informa)on about the current state of the system
Copyright 2011 Cloudera Inc. All rights reserved
The Problem
• Managing Distributed Services is a challenge• A recurring challenge with similar repea)ng issues• What issues ? • Who is in charge?• Am I in charge?• Are we both in charge? • What do I do when I start back aMer failing ?• what is the state of process X ?• Elec)on of Primary out of group of peers
Copyright 2011 Cloudera Inc. All rights reserved
The Solu)on...
• Modify your apps to deal with these issues• Where to start ?• Not trivial• 2 systems, Ac)ve and Standby might suffice• What about split brain ?
Copyright 2011 Cloudera Inc. All rights reserved
The Solu)on con)nued
• You will quickly find that you need a similar set of services• Distributed lock management• Master Elec)on. (Slave Discovery ?)• Informa)on sharing, )me based• Some sort of hierarchy management• Service B is dead, do not offer service C• Includes startup hierarchy management
Copyright 2011 Cloudera Inc. All rights reserved
Ad-‐Hoc Solu)ons ... not good
• A collec)on of ad-‐hoc solu)ons is problema)c in the long term
• Like services should be managed by the same tool• A single place of reference is good
Copyright 2011 Cloudera Inc. All rights reserved
Zookeeper has what you need
Well a significant por)on of what you need.
Zookeeper can provide simple services that allow client services to coordinate or synchronize ac)vi)es and provide a reference point to agree on basic informa)on about the state of the current environment
Copyright 2011 Cloudera Inc. All rights reserved
Zookeeper to the Rescue
• Zookeeper h]p://zookeeper.apache.org/• Built by Yahoo as a distributed system to manage distributed services in distributed systems.
• Clean focussed Open Source tool• Does one thing and does it well
• A highly available distributed coordina)on service
Copyright 2011 Cloudera Inc. All rights reserved
Distributed Systems Need the same thing over and over again• In order to be highly available we must deal with• Leader elec)on• Quorum• temporal dependencies• consensus• shared locks• etc
Copyright 2011 Cloudera Inc. All rights reserved
Zookeeper has goals not unlike your own
ReliabilityAvailabilitySimplicityEase of managementClarity
Copyright 2011 Cloudera Inc. All rights reserved
Distributed Lock Server.. and more
• highly available, distributed metadata filesystem• Runs on 3 or 5 (or more) machines• Transparently elects Master• Other machines are read-only replicas
Copyright 2011 Cloudera Inc. All rights reserved
Zookeeper as a Metadata Filesystem
• Zookeepers model resembles a standard tree based filesystem
• Nodes can store up to 1MB of data• Nodes can have child nodes• Nodes can not be renamed• No hard or soft links• Guarantees• Ordered updates, conditional updates, and watches
Copyright 2011 Cloudera Inc. All rights reserved
A Distributed metadata filesystem
• Over the network creation of nodes• Over the network updating of metadata associated with
a node• Conditional updates • Watches *
The distributed nature of Zookeeper is what makes these simple func)ons extremely useful
* Watch no)fica)on similar to linux ino)fy
Copyright 2011 Cloudera Inc. All rights reserved
But wait.. there is more !!!
• Zookeeper can generate NodeNames in order to avoid collisions
• Can build “presence” awareness systems using Ephemeral Nodes• Nodes that exist as long as you are connected
Copyright 2011 Cloudera Inc. All rights reserved
Zookeeper can provide
• Naming • Synchroniza)on• Leader elec)on• Group Management
Copyright 2011 Cloudera Inc. All rights reserved
Zookeeper details
• Presents a filesystem like abstrac)on with a tree of znodes• each node can hold data and other nodes• How much data? limited to 1MB
• Also stored at each node, transac)on ID, crea)on and modified )mestamps, and version number.
Copyright 2011 Cloudera Inc. All rights reserved
Interac)on with znodes
• create• delete• exists• get data• set data• get children• sync
Copyright 2011 Cloudera Inc. All rights reserved
More on Znodes
• All opera)ons are:• Atomic • persistant • ordered• Access Control List per node define who can perform opera)ons
• ACL determines Read Write Delete Permissions
Copyright 2011 Cloudera Inc. All rights reserved
More API• Sequen)al numbering
– i.e. Create node /n/child_X, segng X to lowest unused value
• Condi)onal updates via the version number– A bit like transac)onal shared memory– Do a read, make all your changes, try and commit– If you fail, start again.
Copyright 2011 Cloudera Inc. All rights reserved
Watching a Node
• A watch can be set on a node allowing ac)on to occur when a node is updated
• In fact, when any state is changed on the server, the client can be no)fied via a watch – including session state changes.
• Sessions are persistent across server failures – if the server you are talking to fails, you can transparently nego)ate session con)nua)on with another server.
Copyright 2011 Cloudera Inc. All rights reserved
API features
• Enough power to implement common concurrent objects– Can do compare-‐and-‐swap on version number (and therefore CAS on data in two round trips)
– CAS has consensus number infinity in wait-‐free hierarchy, so theore)cally powerful, for what that’s worth
• Can build queues, semaphores and so on.– Examples later
Copyright 2011 Cloudera Inc. All rights reserved
Yes, but does it scale?• Typical deployment is 5 to 7 machines
– Always choose an odd number for maximum rela)ve fault tolerance
• All updates go through same machine– So single machine is bo]leneck– Par))oning would probably help
• Throughputs are fairly decent– Tens of thousands of updates per second– Works best in heavy read scenarios
Is it Paxos?
• No, not quite• In standard opera)on, protocols look very similar
• Paxos has to cope with arbitrary message re-‐orderings and compe)ng leaders
DEMO....
Ques)ons?
(Thank you for your Ime)
Copyright 2011 Cloudera Inc. All rights reserved
Appendix A: Resources
• Cloudera• h]p://www.cloudera.com/blog• h]p://wiki.cloudera.com/• h]p://www.cloudera.com/blog/2011/03/simple-‐moving-‐average-‐secondary-‐sort-‐and-‐mapreduce-‐part-‐1/
• Zookeeper• h]p://zookeeper.apache.org/
Copyright 2011 Cloudera Inc. All rights reserved
Appendix B: Demo
ZOOKEEPER DEMO
In this demo we will show some of the functionality of Zookeeper and hopefully assist you in getting started with Zookeeper.
Introduction: Zookeeper is written in java, and we can connect to Zookeeper using Java. There is also a command line client and we will use that for our demo. Since Zookeeper presents a distriuted system coordination platform as a filesystem or directory structure there is also a tool that "mounts" that filesystem in userspace. I have no information regarding stability or production use of zkfuse, but worth checking out. There are python (http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/) and other bindings so that you can start using zookeeper in your application.
Copyright 2011 Cloudera Inc. All rights reserved
Getting started:
For this demo I downloaded zookeeper and gunzipped and untarred in my home directory on my mac.
$ tar xvf zookeeper-3.3.3.tar.gz
This will create a drectory zookeeper-3.3.3$ cd zookeeper-3.3.3
First we will configure a basic Zookeeper config file
$ mv conf/zoo_sample.cfg conf/zoo.cfg
Copyright 2011 Cloudera Inc. All rights reserved
Edit that file so that it resembles this..
# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial # synchronization phase can takeinitLimit=10# The number of ticks that can pass between # sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.dataDir=/tmp/zoo# the port at which the clients will connectclientPort=2181
Copyright 2011 Cloudera Inc. All rights reserved
Start up Zookeeper in server mode. *note that in real environments you would want 3 Zookeepers working together for High Availability.
$ bin/zkServer.sh start
You will get a warning 2011-07-27 17:56:40,231 - WARN [main:QuorumPeerMain@105] - Either no config or no quorum defined in config, running in standalone mode
This warning reflects that Zookeeper is runing in Standalone mode, a single instance.
Zookeeper is a java process so we can verify easilly that it is running by running jps as the user who launched zookeeper, or as root.
$ jps14761 Jps14743 QuorumPeerMain
This shows that process ID 14743 is the Zookeeper server that we launched, technically reffered to as a Quorum, but in our case a Quorum of 1.
Copyright 2011 Cloudera Inc. All rights reserved
With Zookeeper running we can explore some of the features it provides.
Administrative commands: Zookeeper provides a set of 4-letter words, or administrative information commands. Fore example.
$ echo ruok | nc localhost 2181imok ## Zookeeper response
Note that there is no newline after "imok" so it is easy to lose this response visually in your terminal.Also note that the terminal that you started Zookeeper in will still be getting messages from Zookeeper. For puposes of demonstration I like to keep that terminal output available, so launch another terminal window to run the rest of the demos, such as the 'ruok' and other 4-letter words.
If Zookeeper was not running or was not "OK" you would get no response, silence.
Copyright 2011 Cloudera Inc. All rights reserved
MORE 4 LETTER WORDS
'stat' returns statisticsIn order to use the 4-letter words we will echo the word, and then pipe it through netcat to our zookeeper note that this could be a remote instance in our case we will connect locally.
STAT show statistics
$ echo stat | nc localhost 2181Zookeeper version: 3.3.3-1073969, built on 02/23/2011 22:27 GMTClients: /0:0:0:0:0:0:0:1%0:51486[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0Received: 6Sent: 5Outstanding: 0Zxid: 0x0Mode: standaloneNode count: 4
Copyright 2011 Cloudera Inc. All rights reserved
DUMP shows session and ephemeral nodes
$ echo dump | nc localhost 2181SessionTracker dump:Session Sets (0):ephemeral nodes dump:Sessions with Ephemerals (0):
Copyright 2011 Cloudera Inc. All rights reserved
ENVI shows the environment
$ echo envi | nc localhost 2181Environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMThost.name=172.16.1.220java.version=1.6.0_17java.vendor=Apple Inc.java.home=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Homejava.class.path=bin/../build/classes:bin/../build/lib/*.jar:bin/../zookeeper-3.3.3.jar:bin/../lib/log4j-1.2.15.jar:bin/../lib/jline-0.9.94.jar:bin/../src/java/lib/*.jar:bin/../conf:java.library.path=.:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/javajava.io.tmpdir=/var/folders/w4/w4mbu0k8FlG9o1hKAIOarU+++TI/-Tmp-/java.compiler=<NA>os.name=Mac OS Xos.arch=x86_64os.version=10.6.4user.name=thanlonuser.home=/Users/thanlonuser.dir=/Users/thanlon/zookeeper/zookeeper-3.3.3
Copyright 2011 Cloudera Inc. All rights reserved
See the documentation for further details regarding 4 letter words.
For the rest of the demonstration you should connect to zookeeper with the Command line tool. When you run this command you will get a bunch of INFO level information, check to verify that you do not have WARN or more severe messages.
$ bin/zkCli.sh
Run "ls /" to get information of current 'nodes'
[zk: localhost:2181(CONNECTED) 1] ls /[zookeeper]
You see that Zookeeper uses Zookeeper to manage Zookeeper. Nepotism to be sure, but it makes sense here.
Copyright 2011 Cloudera Inc. All rights reserved
Launch another terminal and connect to Zookeeper with the CLI tool, keep both windows open.
Create a node that indicates that a service is starting and child services can begin starting. [zk:… ] create /webapp 'starting ' null
* note the space after starting I think there might be issues with the CLI,
We are creating a znode with the data of "starting" and an Access Control List of null. So anyone can view this node, anyone can create child nodes and anyone can extract the data piece from this node.
Clean up the CLI artifacts..set /webapp starting cZxid = 0x52ctime = Wed Jul 27 20:32:18 EDT 2011mZxid = 0x54mtime = Wed Jul 27 20:37:29 EDT 2011pZxid = 0x52cversion = 0dataVersion = 2aclVersion = 0....
Copyright 2011 Cloudera Inc. All rights reserved
Do a get to confirm..
get /webapp
Perhaps our webapp is not running unless the backend DB is up. And perhaps there is a DBmaster and a DBslave. So from the other window create those.
zk: localhost:2181(CONNECTED) 10] create /webapp/db_master "master " null
Suppose the DB servers are peers, and the first to start is instructed to be the master, and the second is instructed to be the slave.
So perhaps the second one tries to grab the master role..create /webapp/db_master "master " null Node already exists: /webapp/db_master
The node creation would fail and the server would then register as the slave.
create /webapp/dbslave "master " null Created /webapp/dbslave
Copyright 2011 Cloudera Inc. All rights reserved
Once our webapp has the services it needs to run, it can register as running, or perhaps put more of a payload in the data piece and note the IP address.
set /webapp "192.168.1.23"
get /webapp"192.168.1.23"
#### End of DEMO ######