7 puppet horror stories in 7 years - puppetconf 2014
DESCRIPTION
7 Puppet Horror Stories in 7 Years - Kris Buytaert, InuitsTRANSCRIPT
7 Years of Puppet Horror 7 Years of Puppet Horror StoriesStoriesKris Buytaert
@krisbuytaert
KrisKris BuytaertBuytaert● I used to be a Dev,I used to be a Dev,● Then Became an OpThen Became an Op● Chief Trolling Officer and Open Source Chief Trolling Officer and Open Source
Consultant @inuits.euConsultant @inuits.eu● Everything is an effing DNS ProblemEverything is an effing DNS Problem● Building Clouds since before the bookstoreBuilding Clouds since before the bookstore● Some books, some papers, some blogsSome books, some papers, some blogs● Evangelizing devopsEvangelizing devops
Setting the StageSetting the Stage● [root@xen ~]# rpm -qa | grep puppet[root@xen ~]# rpm -qa | grep puppet
puppet-0.23.2-1.el5puppet-0.23.2-1.el5
● -rwxr-xr-x 1 root root 4809 Aug 22 2007 -rwxr-xr-x 1 root root 4809 Aug 22 2007 /usr/bin/puppet/usr/bin/puppet
● Consulting @ different customers.Consulting @ different customers.
● Telling my war stories, so you don't have to Telling my war stories, so you don't have to
Debugging InfrastructuresDebugging InfrastructuresEverything is a Funky DNS Problem
No really, Everything is a Funky DNS ProblemIf it's not a funky DNS Problem ..
It's an arp problemIf it's not an arp problem...
It's a Full Filesystem ProblemIf your filesystem isn't full
It's a Spanning Tree problemIf it's not a spanning Tree problem...
It's a USB problemIf it's not a USB Problem
It might be an ntp problem If it's not an ntp problemIt's a sharing IRQ Problem
If it's not a sharing IRQ ProblemBut most often .. its a Freaking Dns Problem !
Or someone playing tricks on you
Jan 2006
Chapter 1: Deploying a Puppetmaster Chapter 1: Deploying a Puppetmaster
Chicken and EggsChicken and Eggs● Platform = Rack fullPlatform = Rack full
● 2x pxe, yum, dhcpd, puppetmaster2x pxe, yum, dhcpd, puppetmaster
● Reinstalling the platform Reinstalling the platform
• Scratch the central box Scratch the central box
• Scratch the other boxenScratch the other boxen
Works in Dev, Fails in ProdWorks in Dev, Fails in ProdProd = shipped platform on other continentProd = shipped platform on other continent
● Platform works, lets redeploy.Platform works, lets redeploy.
● Take down PuppetMaster.Take down PuppetMaster.
● RebootstrapRebootstrap
● Boostrap first couple of nodes, successBoostrap first couple of nodes, success
● More nodes.. failure start happeningMore nodes.. failure start happening
It's not your codeIt's not your code● Strip out code,Strip out code,
● Runs work Runs work sometimessometimes on almost empty on almost empty catalogscatalogs
● Reenable codeReenable code
● Run server in debug, problem gets worseRun server in debug, problem gets worse
About to rename the puppetmaster to About to rename the puppetmaster to schrodingerschrodinger
24 hours and a VPN Failure later24 hours and a VPN Failure later
The Crulpit :The Crulpit :● Partial failing Managed PowersupplyPartial failing Managed Powersupply
● 6 boxen were not powered off6 boxen were not powered off
● Usually we powered off all the other boxenUsually we powered off all the other boxen
● Breaking webrick , Breaking webrick ,
● With old ssl requestsWith old ssl requests
Rogue SSL QueriesRogue SSL Queries● Apparently I blogged Apparently I blogged
about the first one in about the first one in april 2011.... but it april 2011.... but it feels like 7 years agofeels like 7 years ago
http://www.krisbuytaert.http://www.krisbuytaert.be/blog/24-hours-be/blog/24-hours-puppet-dramapuppet-drama
x
Chapter 2: Honour your parents or your disks will Chapter 2: Honour your parents or your disks will floodflood
#MonitoringSucks#MonitoringSucks● Puppetruns break our Icinga boxenPuppetruns break our Icinga boxen
● BadlyBadly
● FrequentlyFrequently
Stored ConfigsStored Configs
Exporting and Collecting Exporting and Collecting
It ain't borkenIt ain't borken● Successful puppet runSuccessful puppet run
● Successful Icinga reconfigureSuccessful Icinga reconfigure
● Disk usage growsDisk usage grows
● FastFast
A Puppet BugA Puppet Bug
Chapter 3: Release Management Chapter 3: Release Management
It works, it doesn't workIt works, it doesn't work● Imagine a working puppet setupImagine a working puppet setup
● At different customersAt different customers
● Upstream Vendor releases new softwareUpstream Vendor releases new software
● Half of the customers call crying that their Half of the customers call crying that their platform is brokenplatform is broken
A broken mcollectiveA broken mcollective● Customers with self managed (package) repos Customers with self managed (package) repos
are happyare happy
● Customers using upstream repos are in painCustomers using upstream repos are in pain
Repository ManagementRepository Management
● PulpPulp
• Pro : MirroringLovePro : MirroringLove
• Con : Mongo, Stability, .deb, forgeCon : Mongo, Stability, .deb, forge
● PRM PRM
● Yum Repo Server by IS24Yum Repo Server by IS24
Repository ManagementRepository Management
Version vs LatestVersion vs Latest● Version your repos ?Version your repos ?
ensure => latestsensure => latests
● Latest your environments ?Latest your environments ?
● Strict versioning in config ?Strict versioning in config ?
Ensure => '0.98.4'Ensure => '0.98.4'
Use HieraUse Hiera
Chapter 4: We are all devs nowChapter 4: We are all devs now
No more puppet runs :(No more puppet runs :(● Puppet is cronnedPuppet is cronned
● Dashboard shows no successful runs for hoursDashboard shows no successful runs for hours
● puppet agent -t starts , then exits with no errorpuppet agent -t starts , then exits with no error
We didn't change a thing !We didn't change a thing !● Quick git log showed no relevant changes on Quick git log showed no relevant changes on
the code base.the code base.
Let's debugLet's debug● Put agent in verbosePut agent in verbose
● Put master in verbosePut master in verbose
Let's debugLet's debug● Put agent in verbosePut agent in verbose
● Put master in verbosePut master in verbose
● Try different environmentsTry different environments
Let's debugLet's debug● Put agent in verbosePut agent in verbose
● Put master in verbosePut master in verbose
● Try different environments (it works in 1)Try different environments (it works in 1)
● Upgrade puppetmasterUpgrade puppetmaster
● Upgrade puppet agentUpgrade puppet agent
Let's debugLet's debug● Put agent in verbosePut agent in verbose
● Put master in verbosePut master in verbose
● Try different environmentsTry different environments
● Upgrade puppetmasterUpgrade puppetmaster
● Upgrade puppet agentUpgrade puppet agent
● Modify /etc/hostsModify /etc/hosts
Let's debugLet's debug● Put agent in verbosePut agent in verbose
● Put master in verbosePut master in verbose
● Try different environmentsTry different environments
● Upgrade puppetmasterUpgrade puppetmaster
● Upgrade puppet agentUpgrade puppet agent
● Modify /etc/hostsModify /etc/hosts
● Clean SSL CertsClean SSL Certs
Called in the troopsCalled in the troops● I've never seen something like this beforeI've never seen something like this before
● They couldn't find a thing either ..They couldn't find a thing either ..
● We've never seen something like this before.We've never seen something like this before.
We didn't change a thing (2)!We didn't change a thing (2)!
● Quick git log showed no relevant changes on Quick git log showed no relevant changes on the code base.the code base.
● On the manifests included in the platform On the manifests included in the platform where the code was failing where the code was failing
● 95% Linux nodes failing, no code change95% Linux nodes failing, no code change
Exit ZeroExit Zero
● 5% Windows nodes, code changes ignored5% Windows nodes, code changes ignored
=> that module isn't used on Linux right ?=> that module isn't used on Linux right ?
Chapter 5: Software defined NetworkChapter 5: Software defined Network
Business as usualBusiness as usual● Fully puppetized Fully puppetized
platform platform
● Deployed 5+ Deployed 5+ instances of platform instances of platform
● IPSec Tunnels in all IPSec Tunnels in all directions . directions .
One MorningOne Morning● Puppetruns start failingPuppetruns start failing
● Perfect access to new customer platformPerfect access to new customer platform
● Monitoring says everything is fine.Monitoring says everything is fine.
● Boxen can ping puppetmasterBoxen can ping puppetmaster
● Puppetmaster can ping boxenPuppetmaster can ping boxen
Everything is a Funky DNS Everything is a Funky DNS ProblemProblem
● Reverse DNS was brokenReverse DNS was broken
● Fixed Reverse DNS for new platformFixed Reverse DNS for new platform
● Hey .. this looks betterHey .. this looks better
But it's not fixed yetBut it's not fixed yet● It's ssl ?It's ssl ?
● Let's clear some Certs.Let's clear some Certs.
● Nope not SSLNope not SSL
I can connect !I can connect !● I can ping the puppetmasterI can ping the puppetmaster
● I can telnet to port 8140 on the puppetmasterI can telnet to port 8140 on the puppetmaster
● The puppetmaster can ping meThe puppetmaster can ping me
● It all worksIt all works
But But ● I can't wget from the puppetmasterI can't wget from the puppetmaster
● It must be networkingIt must be networking
● Is it a nat rule on the vpn ? Is it a nat rule on the vpn ?
Have you tried turning it of Have you tried turning it of and on again ?and on again ?
● Broken IPSec setupBroken IPSec setup
● Restarting tunnels Restarting tunnels solved it.solved it.
Chapter 7: There's 3 hard things in ITChapter 7: There's 3 hard things in IT
- Cache Invalidation- Cache Invalidation
- Shared Memory Management- Shared Memory Management
- Off by 1 errors- Off by 1 errors
- Naming things- Naming things
Yet another broken puppetrunYet another broken puppetrun
err: Could not retrieve catalog from remote err: Could not retrieve catalog from remote server: Error 400 on SERVER: Another local or server: Error 400 on SERVER: Another local or imported resource exists with the type and title imported resource exists with the type and title Apache::Vhost[somehost.somecusterom.eu_80_pApache::Vhost[somehost.somecusterom.eu_80_proxy] on node proxy.somewhere.euroxy] on node proxy.somewhere.eu
warning: Not using cache on failed catalogwarning: Not using cache on failed catalog
But that shouldn't be thereBut that shouldn't be there
● Vhost does not get exported to thereVhost does not get exported to there
● Box should not collect thatBox should not collect that
But that shouldn't be thereBut that shouldn't be there
● Vhost does not get exported to thereVhost does not get exported to there
● Box should not collect thatBox should not collect that
● Digging in to exportsDigging in to exports
● Digging into puppetdb , hows your psql ?Digging into puppetdb , hows your psql ?
CollectingCollectingApache::Vhost <<| tag == 'proxy' |>> {}Apache::Vhost <<| tag == 'proxy' |>> {}
Apache::Vhost::Ssl <<| tag == 'proxy' |>> {}Apache::Vhost::Ssl <<| tag == 'proxy' |>> {}
Apache::Vhost::Mod::Reverse_proxy <<| tag == Apache::Vhost::Mod::Reverse_proxy <<| tag == 'proxy' |>> {}'proxy' |>> {}
Beware what you collect!Beware what you collect!● profiles::vhost::proxyprofiles::vhost::proxy
Lessons LearnedLessons Learned● Repository Management is ImportantRepository Management is Important
● Development environments are relevant, also Development environments are relevant, also for puppet code !for puppet code !
● We all hate SSLWe all hate SSL
● You never monitor enoughYou never monitor enough
● Upgrading is mostly not the solutionUpgrading is mostly not the solution
● Unless it's a bug in PuppetUnless it's a bug in Puppet
● Release Management is not a solved problemRelease Management is not a solved problem
ContactContactKris Buytaert Kris Buytaert [email protected]@inuits.be
Further ReadingFurther Reading@krisbuytaert @krisbuytaert http://www.krisbuytaert.be/bhttp://www.krisbuytaert.be/blog/log/http://www.inuits.be/http://www.inuits.be/
InuitsInuits
Duboistraat 50Duboistraat 502060 Antwerpen2060 AntwerpenBelgiumBelgium891.514.231891.514.231
+32 475 961221+32 475 961221