puppet at opera sofware - puppetcamp oslo 2013
DESCRIPTION
A bit of history, frustration-driven development, and why and how we started looking into Puppet at Opera Software. What we're doing, successes, pain points and what we're going to do with Puppet and Config Management next.TRANSCRIPT
Puppet at OperaPuppet Camp Oslo 2013
devs sysadmin
devs sysadmin
DevSys?
FDD
Frustration Driven Development
# LVS main config file # # Last modified: # 2012-12-10 Commented out all wlb servers, as they haven't been in use … # 2012-XX-XX Tons of shifting around servers, upgrading and problems (Everyone) # 2011-04-01 Removed all old b#-servers (N.....) # 2010-03-24 Bye bye bigma. (M..../Cosimo) # 2010-03-03 Restore pre Feb 26th config that seems to ensure stability (Cosimo) # When adding bigboy/bigcat, bad site lockups happen # 2010-03-03 Reducing weight on b12 as it is less powerfull (M....) # 2010-02-26 re-adding bigdog, and lowering bigunc, also vamping up b12 to 100% # 2010-02-26 Bigdog is crashing, removing from lvs (M......) # 2010-02-03 Enabled f8 and b7, first b7, then some hours later f8 … (N......) # 2010-01-19 Bigant ready to rock and roll! (Cosimo) # 2010-01-13 Removed bigpa, fatgirl from database pool (Cosimo) # 2010-01-07 Added b8 to backend pool (Cosimo) # 2010-01-05 Added bigant to the My Opera databases (Cosimo) # 2009-11-22 Added bigdog to the My Opera databases (Cosimo) # 2009-11-18 Added b7 and f8 as back-end servers (M.....) # 2009-11-18 Removed p23-02 backend, moved to auth (Cosimo) # 2009-11-12 Removing b7 and f8 from Mysql Load balancers (Cosimo) # 2009-11-11 Added Lenny backend p23-02 (Cosimo) # 2009-10-11 phased-in InnoDB-powered bigma in production (Cosimo) # 2009-09-23 phased-in InnoDB-powered bigma in production (Cosimo) # 2009-06-27 switched master from bigma to bigsis (w-mlb) \o/ (N.....) # 2009-06-23 shifting load away from bigbro. it's dying? (Cosimo) # 2009-03-18 pushing bigbro as much as we can, to test it out (Cosimo) global_defs { lvs_id MY_LVS … }
innodb_buffer_pool_size = 128M # was 64M # was 16M # was 32M
The Pilot – Goals
● New deployment procedure
● Sane configuration files
● Configuration management
CM Tools Evaluation (2009)
CFEngine 2
BCfg2
Puppet 0.25.4
LCFG
CM Tools Evaluation
CFEngine 2
BCfg2
Puppet 0.25.4 2.6.2 2.7.14→ →
LCFG
The very beginning...
commit 9c54321f51bf969940b63b48d055743ac504035eAuthor: Cosimo Streppone <[email protected]>Date: Thu Jan 14 13:21:40 2010 +0000
Generic puppet recipes. To be continued.
Our approach
A “conservative” approach, surely
• Keep it simple. No concat/append/modify
• As few dependencies as possible
• Stability and reliability is critical
• No pulls from github or external URLs
• We don't use puppet for deployment
• Even realize() gets me into panic mode
Three Years In
• Modules repository, with 60+ mods• Some custom facter plugins• Shared projects conventions & structure• Shared deployment procedures and libs• Good server baseline configuration• Our team, ~200 nodes• Opera Mini Ops team, thousands of nodes
Datacenters
It's Modules all the way down...
Apache
base_packages
Cassandra
Django
Bash
RRDCached
Munin
Solr 4.0
RabbitMQ
Postfix
Varnish
Statsd
PowerDNS
Tomcat
Sshsecurity_upgrades
Projects structure
Master config file /config/production.json
Role-specific files /config/role/<role>/
Puppet manifests /config/puppet/
Deployment scripts /deploy/
Master configuration file{ "master_rev" : "20130129", "application" : "geodns", "environment" : "production", "domain" : "localdomain", "contact" : "[email protected]",
"puppet_vars" : { # Available in manifests "some-password" : "hola/amigos" },
"systems" : { # List of all hostnames and their roles "node01" : { "puppet_class" : [ "geodns::backend" ] }, "node02" : { "puppet_class" : [ "geodns::frontend" ], "puppet_vars" : { … }, }, … }
/etc/puppet →
puppet.conf (master configuration file)
fileserver.conf
files → {auth, geodns, opcdn} (local project files)
modules → (shared generic modules)
{ntp, apache, varnish, nginx, ...}
manifests → (generic and project specific manifests)
classes/
{basenode, backend, frontend}.pp
classes/ <project> /
<anything goes, project-specific>
Puppet master layout
/etc/puppet/manifests/site.pp
$server = "puppetmaster.opera.com" import "os/*.pp" import "classes/*.pp" # generic classes import "classes/*/*.pp" # project classes node default { include basenode } filebucket { "main": server => $server } File { ignore => ['.svn', '.git', 'CVS' ], backup => "main", }
Puppet master - site.pp
/etc/puppet/puppet.conf
external_nodes = /etc/puppet/bin/puppet-node-classifier
node_terminus = exec
/etc/puppet/manifests/nodes/geodns-production.json
{ "application" : "geodns",
"environment" : "production",
"domain" : "localdomain",
"systems" : {
"node01" : {
"puppet_class" : [ "geodns::backend" ],
}, …
}
}
Puppet master – no nodes.pp
$ facter --puppetarchitecture => amd64datacenter => nervdomain => opera.comfacterversion => 1.5.7fqdn => node01.int.opera.comhardwareisa => unknownhardwaremodel => x86_64hostname => node01id => rootinterfaces => eth0,eth1ipaddress => 1.2.3.4ipaddress_eth0 => 1.2.3.4…
Facter
facter/datacenter.rb
Facter.add("datacenter") do setcode do datacenter = "unknown" # Get current ip address from Facter's own db ipaddr = Facter.value(:ipaddress) if ipaddr.match("^1\.2\.3\.") datacenter = "dc1" elsif ipaddr.match(...) … end endend
Facter – custom plugins
case $datacenter { "dc1" : { include opera::datacenters::dc1 } "dc2" : { include opera::datacenters::dc2 } "dc3" : { include opera::datacenters::dc3 } … default: { include opera::datacenters::base }}
Facter – custom plugins
class basenode {
include opera
# Opera-specific data-center based settings case $datacenter { "dc1" : { include opera::datacenters::dc1 } … default: { include opera::datacenters::base } }
include apt-opera include base_packages include locales include logcheck include munin include nagios include cron include perl include python include puppet include ntp include timezone … }
Basenode class
autosign+ some preinstalled packages+ internal apt repository+ a bit of shell scripting
Bootstrap script
Real world examples – 1 Project class geodns::backend {
include opera::admins::devops include security-upgrades include powerdns include geoip::city include memcache
package { [ 'libjson-xs-perl', … ]: ensure => 'present' }
bash::prompt { '/root/.bashrc': description => 'geodns', color => 'red', }
munin::plugin::custom { 'geodns_': } munin::plugin { [ 'geodns_country', 'geodns_errors', … ]: plugin_name => 'geodns_', } }
Real world examples – 2 Varnish
varnish::config { "project-varnish-config":
vcl_conf => "tvstore.vcl", storage_type => "malloc", storage_size => "512M", listen_port => 8100, sess_workspace => 131072, ttl => 60, thread_pools => 2, thread_min => 400, thread_max => 3000,
# Needed for GeoIP support in varnish: # http://stackoverflow.com/questions/5906603/ cc_command => "exec cc -fpic -shared -Wl,-x \ -L/usr/include/GeoIP.h -lGeoIP -o %o %s"
}
Real world examples – 3 Munin
include munin::server
file { '/etc/munin/munin-conf.d/project-settings.conf': … }
Real world examples – 4 Solr
include solr4
solr4::core { 'core1': config => '.../core1/solrconfig.xml', properties => '.../core1/solrcore.properties', schema => '.../core1/schema.xml',}
solr4::config { 'solr-search-config': cores => ['core1', … ],}
Pain points AKA wish-list
Speed!
~60 s runtime ~600 resources→
TOO SLOW!
notice: /Stage[main]/Django/Package[Django]/ensure: ensure changed '1.4.3' to '1.4.2'
notice: /Stage[main]/Package[cython]/ensure: created
notice: /Stage[main]/Java::Sun_java6/Exec[debconf-set-selections-sun-java6-bin] /returns: executed successfully
notice: /Stage[main]/Java::Sun_java6/Exec[debconf-set-selections-sun-java6-jre] /returns: executed successfully
Resources that don't go away
Shared resources
cron::logcleanup { … }
• Used by both Apache and Nginx modules• Getting conflicts if you pull both
Shared environment
Many projects run under the same master.
A syntax error anywhere blocks everyone.
Testing
Would be awesome to be ableto test our modules and manifests.
Locally.
Without a puppetmaster.
Future directions
Things we'd like to look into...
• PuppetDB
• Better systems inventory
• Better Nagios integration
• Testing manifests and modules
Q & A
https:/ /github.com/cosimo/http://w w w.streppone.it /cosimo/blog/