Change function of time
2014 118,000 hosts
13,000 environments fewer puppetmasters
baremetal, VM, containers
Classification
node_terminus = /enc_script.rb
320ms - loading gems, files, certs only 100ms for API call to ENC Optimize: ENC run time as close to 100ms as possible
Classification
a little dash of bash
node_terminus = /enc_handler.sh $ cat enc_handler.sh!... !echo $1 | nc -U /unix.sock !... !
Classification
a little go go
William Kennedy’s workpool (github.com./goinggo/workpool) go server listening on /unix.sock workpool routes requests to an idle worker
Classification
exec/exit to listen/process
$ cat /enc_script.rb!… !while certname = $stdin.gets do ! enc(certname) !end !… !
Classification
PPM calls node_terminus
node_terminus writes request to socket
go handles request, workpool routes
Classification
end result
gets close to 100ms goal – 110ms CPU usage – no constant bootstrapping frees up resources, puppet master process at scale, 200ms per run adds up quickly (30 for every 60 seconds of CPU time)
catalogs
Catalog compilation – low hanging fruit, difficult
Catalog
source: http://www.isrubyfastyet.com
agents
everything is SSL, that is good everything is SSL, that is expensive use yum.puppetlabs.com. or apt.puppetlabs.com. to make sure you run 3.7+ runtime savings: 40%
Catalog
post run woes
after agent runs, the real fun begins puppetmaster and agent both wait for report processors to finish slow report collection will cause your infrastructure to fall over – some just avoid it
Reports/Facts
foreman
foreman report/fact processing – need to spread read I/O fact processing is read heavy, reports are write heavy ruby activerecord: makara postgresql: local read slaves, pg_shard
Reports/Facts
reports
4k run reports per minute using pg_shard: psql> SELECT master_create_distributed_table(table_name := ’reports', partition_column := ‘report_id'); psql> SELECT master_create_worker_shards(table_name := ‘reports', shard_count := 365);
Reports/Facts
facts
most of the workload is read I/O, kept local
facts updated immediately after puppet runs Master DB loadavg 2
Reports/Facts
simple is hard
“Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple. But it’s worth it in the end because once you get there, you can move mountains.”
- Steve Jobs
Host events
most systems have audit frameworks files (inotify) processes (audit) network puppet needs react to these events
osquery
services, files, and any resource that can be tracked as a host event event information can also be recorded (doorman, zentral, etc) event info is stored in tables (sqlite)
file monitoring
{ !"file_paths": { ! "homes": [ ! "/root/.ssh/%%", ! "/home/%/.ssh/%%" ! ], ! ”binaries": [ ! "/usr/bin/%%", ! "/sbin/%%" ! ], ! "etc": [ ! "/etc/%%" ! ], ! "tmp": [ ! "/tmp/%%" ! ] ! } !} !
Infrastructure events
code releases, package upgrades, access changes puppet needs to be told to run when these events occur
pvc and foreman
foreman’s puppetrun API to set flag pvc queries foreman to trigger run logical separation with host groups
runinterval is an after thought
puppet runs instantly when it needs to runinterval can be 3 minutes or 3 hours frees up puppet masters, allows more resources for other things your infrastructure is still kept honest
I pummel people with questions, because I need to know what they're thinking, what they're trying to achieve, what
they believe the final outcome is going to be. Tim Gunn