puppet at github / chatops
DESCRIPTION
"Puppet at GitHub / ChatOps" from PuppetConf 2012, by Jesse Newland Video of "Puppet at GitHub": http://bit.ly/WVS3vQ Learn more about Puppet: http://bit.ly/QQoAP1 Abstract: Ops at GitHub has a unique challenge - keeping up with the rabid pace of features and products that the GitHub team develops. In this talk, we'll focus on tools and techniques we use to rapidly and confidently ship infrastructure changes/features with Puppet using Puppet-Rspec, CI, Puppet-Lint, branch puppet deploys, and Hubot. Speaker Bio: Jesse Newland does Ops at GitHub. His favorite hobby is SPOF wack-a-mole, followed closely by guitar and piano. Prior to GitHub, Jesse was the CTO at Rails Machine where he ran a large private cloud and managed several hundred production Ruby on Rails applications using Puppet. To the delight and/or chagrin of the Puppet community, Jesse is to blame for Moonshine, the Ruby DSL for Puppet before Puppet had a Ruby DSL.TRANSCRIPT
Jesse Newlandjnewland
hey errbodymy name is jesse newlandI do ops at GitHub
atPuppet
GitHubAnd today I’m going to be talking about Puppet at GitHub.
Really, I’m telling a story in two parts.
All of the amazing Puppet OSS projects @rodjek
has written but doesn’t want to talk about
First... I’ll be talking about all of the amazing Puppet open source projects Tim Sharpe has written but doesn’t want to talk about
and how we use them at GitHub
*And then, I want to introduce you to the star of the GitHub Ops team, Hubot, and tell you a little bit about something we’ve been calling ChatOps
Setupthe
But, before I get into all of that, I'm actually going to talk about an upcoming talk, one by a coworker of mine at GitHub. Will Farrington is going to be speaking tomorrow at 2:45pm about The Setup, our Puppet-powered GitHubber laptop management solution. It's amazing. It's one of the coolest uses of Puppet I've ever seen, and it's going to completely change the way you think about your development environment.
But I’m not going to be talking about any of that today.
So, yeah, go to Will's talk tommorrow. You won't be disappointed.
atPuppet
GitHubSo I guess you could say that I’m talking about
ofPuppetatGitHub
THE RESTthe rest of puppet at github. For the scope of this talk, I’m going to be talking about the Puppet infrastructure that runs github.com
4 years, >100k LOC
We’ve been managing GitHub’s infrastructure with Puppet for 4 years, since the move to Rackspace. There’s a ton of code, and we’re developing at a rapid pace.
SimpleBut we are obsessed with keeping our Puppet deployment simple
Single Master
We use a single puppetmaster running lots of unicorns. Nothing fancy. It works for now.
However, we will need to scale this tier up or out in about 6 months if the trends look right. We’ll probably switch to two load balanced puppetmasters around that time.
# cat /etc/cron.d/puppet 13 * * * * root /usr/bin/
cron FTWWe don’t run the agent, but rather run puppet on cron every hour in combination with runs triggered via Hubot (more on that later)
NoENC
We don’t use an external node classifier
$ cat manifests/nodes/janky.rscloud.pp
node /^janky\d+\.rscloud\.github\.com$/ { github::role::janky { 'janky': public_address => dns_lookup($fqdn), nginx_hostname => $fqdn, }}
([a-z0-9\-_]+)(\d+)([a-z]?)\.(.*)\.github.com
Instead, we give nodes DNS names that adhere to a naming convention that maps them to a pre-defined role
$ head modules/github/manifests/role/janky.pp
define github::role::janky($public_address, $nginx_hostname='', $god=true ) {
github::core { 'janky': }
include github::app::janky
github::nginx { 'janky': }
}
Where the magic happens
Role definitions are where the magic happens. We try to DRY common functionality into our core module and into other simple classes or defines so that role definitions read like a nice summary of what makes this role different from others
augeas { 'my.cnf/avoid_cardinality_skew': context => '/files/etc/mysql/my.cnf/mysqld/', changes => [ 'set innodb_stats_auto_update 0', 'set innodb_stats_on_metadata 0', 'set innodb_stats_on_metadata 64' ], require => Percona::Server[$::fqdn], }
Heavy use of augeas
We generally try to avoid templates for configuration files in favor of using aw ge us
Lets us manage the small pieces of configuration we care about and use the OS defaults for the things we don't.
BORINGBut I don’t want to just show all of you Puppet code for thirty minutes. That's boring
What’s interesting about Puppet at
GitHub?I’d rather talk about what's interesting about how we use Puppet at GitHub. And what I think is the most interesting is that we focus heavily on ensuring the Puppet development workflow is easily accessible to everyone at GitHub.
Making Puppet Less
ScaryWe’re doing our best to make puppet less scary for people that aren’t familiar with it, so they can help the Ops team grow and evolve our infrastructure. We’re doing some things right here, but there’s still a lot of work to do.
I’ve been thinking about this a lot recently as we’ve just had two large infrastructure projects shipped by people that were completely or relatively new to puppet. First, Derek Greentree shipped a Cassandra cluster,,,
And Adam Roben shipped puppet manifests for our windows build and CI servers.
thisis
goodThis is an awesome trend, and I want it to continue. So I thought I’d talk a bit today about what we’re doing to try to enable even more of this.
Flow just like a (GitHub)
Ruby projectFor us, an important part of making Puppet development accessible for other developers at GitHub is making the development flow on our puppet codebase as similar as possible to that of any other GitHub Ruby project. That means sticking with some common conventions
$ ./script/bootstrap
Setup
Like making it as easy to setup as any other project at GitHub
$ cat Gemfilesource :rubygems
gem 'puppet', '2.7.18'gem 'facter', '1.6.10'gem 'rspec-puppet', '0.1.2'gem 'rake', '0.8.7'gem 'puppet-lint', '0.2.1'gem 'ruby-augeas', '0.3.0'gem 'json', '1.5.1'gem 'fog', '1.3.1'gem 'librarian-puppet', '0.9.4'gem 'parallel_tests'
So ruby deps are managed by Bundler
$ cat Puppetfile
forge "http://forge.puppetlabs.com"
mod 'puppetlabs/apt'...
And puppet deps are managed by librarian-puppet, a bundler-like library that manages the puppet modules your infrastructure depends on and install them directly from GitHub repositories.
I’m of the opinion that the unit of open source currency is no longer a tarball downloaded from a something named *forge. It’s a GitHub repo. All of the developers at GitHub feel the same way, so Tim wrote librarian puppet
rodjek / librarian-puppet
For those of you keeping score at home, that’s the first of Tim Sharpe’s open source projects that I’ve mentioned. Hi Tim!
Making puppet flow like other projects at GitHub means ensuring we have good editor support for the language
rodjek / vim-puppet
vim-puppet, that’s two.
$ ./script/cibuild
Tests
It means running tests is a simple one-step process
TESTS!Tests are super important. A solid and easy to use test harness helps build developer confidence in a new language.
Safetynet
And tests are crucial safety net for helping people cut their teeth on Puppet if they haven’t ever touched it before.
should contain_github__firewall_rule('internal_network')
should contain_ssmtp__relay_to('smtp').with_relay_host('smtp')
should contain_file('/etc/logstash/logstash.conf')
should include_class('github::ksplice')
should contain_networking__bond('bond0').with( :gateway => '172.22.0.2', :arp_ip_target => '172.22.0.2', :up_commands => nil )
rspec-puppet
We use rspec-puppet heavily. If you haven’t used rspec-puppet yet, go check it out right now.
It’s amazing.
There are no less than three talks about it at Puppetconf, so I’m not going to talk about HOW to use it today, just touch a little bit on how WE use it.
rodjek / rspec-puppet
rspec-puppet, that’s three
describe 'github::role::fe' do let(:title) { 'fe' } let(:node) { 'fe1.rs.github.com' } let(:params) { { :public_address => '207.97.227.242/27', :private_address => '172.22.1.59/22', :git_weight => '16' } } let(:facts) { { :ipaddress => '172.22.1.59', :operatingsystem => 'Debian', :datacenter => 'rackspace-iad2', } }
it do should contain_github__core('fe') ... endend
rolespecs
areking
We try our best to adequately test our individual puppet modules, but our central and most frequently touched specs exercise our role system. There’s one spec for each role which describes its intended functionality.
These specs focus on critical functionality of each role, and help a great deal to build confidence that we’re not introducing regressions when adding or refactoring functionality or working in other roles.
$ git commit -am "lolbadchange"modules/github/manifests/role/fe.pp:err: Could not parse for environment production: Syntax error at 'allow_outbound_syslog'; expected '}' at /Users/jnewland/github/puppet/modules/github/manifests/role/fe.pp:31modules/github/manifests/role/fe.pp - WARNING: => is not properly aligned on line 626
.git/hooks/pre-commit
For an even faster feedback loop than running specs, all Puppet dev environments automatically get setup with a pre-commit hook that checks for syntax errors and ensures your changes confirm to the Puppet Style guide.
This has proved amazingly useful for Puppet novices and experts alike, novices finding it helps them understand language conventions quickly and guides them towards solutions, and experts using it to catch typos and help them not look like novices.
rodjek / puppet-lint
puppet-lint, that’s four, btw.
specs run on each push
auto deploy on CI passrspec-puppet and puppet-lint are automatically run by CI on every commit on every branch pushed to our Puppet repo.
Once master passes CI, puppet is automatically deployed
As you can see, Hubot automates a lot of the process of rolling out Puppet
That example covered pushing changes to master, but what about a Pull-Request based workflow?
Say we have a pull request for a branch we want to merge, and that we’ve reviewed the code and it all looks good.
environments
branches==
On each deploy, we turn all git branches into puppet environments.
This combined with heaven, our capistrano-powered deployment API we interact with via Hubot, enables us to experiment with unmerged Puppet branches in a powerful way
So, to safely merge this pull request...
hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
You might ask Hubot to confirm its build status
Build #108816 (5fe75932f26ea62cb5fc5e3d0cb302cc2461d11e) of puppet/git-gh13 was successful(421s) github/
Yup, looks good.
hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
Then roll the branch out to a staging box to make everything applies cleanly there.
** [out :: REDACTED ] Bootstrapping...** [out :: REDACTED ] Gem environment up-to-date.** [out :: REDACTED ] Running librarian-puppet...** [out :: REDACTED ] Generating puppet environments...** [out :: REDACTED ] Cleaning up deleted branches...** [out :: REDACTED ] Done!** [out :: REDACTED ] Sending 'restart' command** [out :: REDACTED ] The following watches were affected:** [out :: REDACTED ] puppetmaster_unicorn** [out :: fs1a.stg.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090'** [out :: fs1a.stg.github.com] notice: /Stage[main] Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13'...
Yup, looks good.
hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
Then, if you wanted an extra layer of confidence, you could noop the branch against a production node
** [out :: REDACTED ] Bootstrapping...** [out :: REDACTED ] Gem environment up-to-date.** [out :: REDACTED ] Running librarian-puppet...** [out :: REDACTED ] Generating puppet environments...** [out :: REDACTED ] Cleaning up deleted branches...** [out :: REDACTED ] Done!** [out :: REDACTED ] Sending 'restart' command** [out :: REDACTED ] The following watches were affected:** [out :: REDACTED ] puppetmaster_unicorn** [out :: fs1a.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090'** [out :: fs1a.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: would have changed from '1:1.7.10-1+github12' to '1:1.7.10-1+github13'...
Yup, looks good
hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
Next, you’d merge the pull request. If you stopped here, the code would gradually roll out to all affected nodes over the next hour.
hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
If you wanted the rollout to happen faster than that, you could force a puppet run on the affected class of nodes
** [out :: REDACTED ] Bootstrapping...** [out :: REDACTED ] Gem environment up-to-date.** [out :: REDACTED ] Running librarian-puppet...** [out :: REDACTED ] Generating puppet environments...** [out :: REDACTED ] Cleaning up deleted branches...** [out :: REDACTED ] Done!** [out :: REDACTED ] Sending 'restart' command** [out :: REDACTED ] The following watches were affected:** [out :: REDACTED ] puppetmaster_unicorn** [out :: fs1a.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090'** [out :: fs7b.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090'** [out :: fs1a.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13'** [out :: fs7b.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13'...
Yup, that looks good.
hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
Then you’d probably want to check out load to make sure nothing went crazy
Yup, looks good
hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
...and maybe check some logs or other related metrics to confirm your change didn’t break something
Yup, looks good
ChatOpsHow we interact with Puppet via Hubot is a great example of a core principal of how we do ops at GitHub. We’ve been calling it ChatOps recently.
Essentially, ChatOps is the result of Hubot becoming sentient, and decreeing, among other things, that we now address him as “Supreme Leader” and communicate with our infrastructure though his secure channels alone.
We occasionally observe him speaking in tongues that sound eerily like YouTube comments.
HubotActually, that’s not it at all. Hubot is the star of our Ops team.
heavenjankyshell
graphmeHubot
We use hubot day in day out to interact with other simple tools we’ve written over JSON apis.
hubotheaven
jankyshellgraphme
ALL OFTHE APIS
Hubot interacts nicely with tons of external APIs too. If you have a JSON API, making your service work with Hubot is a piece of cake.
Why is this stupid chat bot so
important to Ops? But why do we obsess about Hubot so much? It’s just a chat bot, right?
There are some distinct upsides to this approach we’ve notices as our use of Hubot in Ops has grown
hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
Remember the flow I just showed you for rolling out puppet changes to our infrastructure?
Everyone sees all of that happen
on their first dayEveryone sees all of this happen from the minute they join GitHub. It’s right there, in the Ops room, right in the middle of the conversation in campfire.
You don’t just see how to roll out puppet, you see how to...
hubot ci status github/smoke-perf
check the status of branch’s last build
hubot deploy github/smoke-perf to prod/fe1
deploy a any branch of any github app to any server
hubot graph me -10min @app-perf
get graphs of the app’s recent performance
hubot procs unicorn
check the status of unicorns across all frontends
hubot resque critical
check the status of the resque critical queue
hubot graph me -10min @collectd.load(fe*)
check load on the frontends
hubot conns fe1
check current connections to a frontend that you suspect has a problem
hubot log me smoke fe1
grab smoke logs for that frontend and realize that you did, in fact, break it
hubot lbctl disable fe1
take it out of the load balancer
hubot status yellow Bad deploy. Reverting now.
update the status blog
hubot who’s on call
determine who is currently on call so you can apologize to them
hubot pingdom checks
check pingdom to make sure you haven’t broken everything
hubot upset me
chill yourself out really quick
hubot deploy github to prod/fe1
revert back to master on the busted frontend
hubot log me smoke fe1
verify things have returned to normal
hubot air drum me
get pumped up because you fixed it
hubot lbctl enable fe1
bring the fixed frontend back into the rotation
hubot status green All systems go.
clear alerts on the status page
hubot whois 4.9.23.22
Once the outage has been resolved, you might see how to grab whois information for an IP that exhibited suspicious activity in the logs you saw
hubot khanify spammers
and how to hit meme generator to make a joke when you realize that IP is a spammer
hubot play in the air tonight
then someone would queue up the song that popped into their head when they thought about drums and gorillas at the same time
hubot tweet@github PuppetConf Drinkup Friday night at 8:30 at Zeke’s (3rd & Brannan)
and then finish it all off with a tweet about the Drinkup we’re throwing friday night
ChatOpsChatOps means building tools that make it easier to operate your infrastructure via Hubot than via Terminal or Chrome
By placing tools directly in the middle of the conversation
Because...
Everyoneis pairing
all of the timeThis is the core concept behind ChatOps.
Teachingby
doingTeaching by doing is awesome
This was always my main motivation with hubot - teaching
by doing by making things visible. It's an extremely
powerful teachingtechnique - @rtomayko
Ryan Tomayko had this in mind from the very first commits to hubot, which just presented a simple wrapper around a repository of shell scripts we use for management and monitoring our infrastructure.
This is how I respond to “how to I do X” questions in Campfire now.
If there’s not yet Hubot functionality to do a thing, we try to write it.
Communicateby
doingPlacing tools in the middle of the conversation also means you get communication of your work for free.
If you’re doing something in a shell or on a website, you have to do it, then tell people about it. If you do it with hubot, that comes free.
THINGS IHAVEN’T ASKED
RECENTLYFor example, here are a few things I haven’t asked recently because Hubot has told me the answer
THINGS IHAVEN’T ASKED
RECENTLYhow’s that deploy going?
THINGS IHAVEN’T ASKED
RECENTLYhow’s that deploy going?
are you deploying that or should i?
THINGS IHAVEN’T ASKED
RECENTLYhow’s that deploy going?
are you deploying that or should i?
is anyone responding to that nagios alert?
THINGS IHAVEN’T ASKED
RECENTLYhow’s that deploy going?
are you deploying that or should i?
is anyone responding to that nagios alert?
is that branch green?
THINGS IHAVEN’T ASKED
RECENTLY
is that branch green?
how’s that deploy going?are you deploying that or should i?
is anyone responding to that nagios alert?
how does load look?
THINGS IHAVEN’T ASKED
RECENTLY
is that branch green?
how’s that deploy going?are you deploying that or should i?
is anyone responding to that nagios alert?
how does load look?
did anyone update the status page?
THINGS IHAVEN’T ASKED
RECENTLYhow’s that deploy going?
are you deploying that or should i?
is anyone responding to that nagios alert?
is that branch green?
how does load look?did that deploy finish?
did anyone update the status page?
Free communication is especially crucial in a distributed environment.
Our Ops team is entirely remote, so Campfire is our default means of communication.
http://www.flickr.com/photos/7997249@N06/6061305639/This is extremely helpful during outages or other situations that require tactical response.
You don’t have to SAY that you’re spraying water on the fire, people SEE you doing it.
Hidethe
uglyAnother awesome benefit of ChatOps-ing all of the things is that you can hide ugly interfaces and design exactly the interaction you want with some simple porcelain commands
My favorite example of this is ugliest of the ugly, Nagios.
[nines] hubot opened issue #4263: Nagios (229906) - fs3b/syslog - Tue Sept 25 23:40:18 PDT 2012. github/nines#4263
Hubot politely delivers nagios alerts directly into chat
hubot nagios ack fs3b/syslog
# fix stuff
nagios check fs3b/syslog
nagios status fs3b/syslog
hubot nagios downtime fs3b/syslog 90
nagios mute fs3b/syslog
nagios unmute fs3b/syslog
Which we can interact with without any unnecessary eye bleeding. Making this easy means developers and other ops engineers actually mute or schedule downtime when they’re testing things.
Mobile
FTWYet another awesome benefit of ChatOps is that you get mobile support for free
Well, that is, if you have a team of awesome iOS developers that have built an actually functioning Campfire client for the iPhone
This lets you do anything hubot can do from your phone.
Which means from your couch. Or your bed. Or a beach in Hawaii.
Which means you can fix a lot of things without pulling your laptop out of your bag.
ChatOpsThat’s ChatOps at its finest.
And now for something completely different
While I’m showing off mobile stuff, I thought I’d slip in a demo of something else we’ve done to make Ops more mobile friendly.
We’ve hacked together support for PagerDuty alerts via Apple Push Notifications. When you swipe on the alert, you go directly to the PagerDuty mobile UI for an incident
Which lets you ack an alert
while you’re still in bed
or on the couch.
BoomI can’t even begin to tell you how happy this makes me, and how less shitty it makes being on-call
So, who better to summarize all of this than Hubot himself. I asked him what he thought about ChatOps. Here’s what he said:
ChatOps all the things.
Listen to what Hubot said. You’ll love it. Your ops team will love it.
And you’ll help other developers learn how to interact with ops tools without any additional work.
That’s awesome.
Work at [email protected]
If you can’t ChatOps all the things at your gig now, you could always just come work with me at GitHub.
Shoot me an email if you’re interested.
Thanks!
That’s all I have. Thanks for listening! any questions?
Tomorrow @ 8:30 PM
Zeke’s
3rd & BrannanWhile I still have everyone’s attention, I wanted to mention the GitHub Drinkup we’re throwing for Puppetconf again. It’s tomorrow night at 8:30pm at Zeke’s, which is on the corner of 3rd and Brannan, everyone’s invited. I’ll see you there.
Thanks again!