7 puppet horror stories in 7 years - puppetconf 2014

52
  7 Years of Puppet Horror 7 Years of Puppet Horror Stories Stories Kris Buytaert @krisbuytaert

Upload: puppet-labs

Post on 25-Dec-2014

618 views

Category:

Technology


3 download

DESCRIPTION

7 Puppet Horror Stories in 7 Years - Kris Buytaert, Inuits

TRANSCRIPT

Page 1: 7 puppet horror stories in 7 years - PuppetConf 2014

  7 Years of Puppet Horror 7 Years of Puppet Horror StoriesStoriesKris Buytaert

@krisbuytaert

Page 2: 7 puppet horror stories in 7 years - PuppetConf 2014

KrisKris BuytaertBuytaert● I used to be a Dev,I used to be a Dev,● Then Became an OpThen Became an Op● Chief Trolling Officer and Open Source Chief Trolling Officer and Open Source

Consultant @inuits.euConsultant @inuits.eu● Everything is an effing DNS ProblemEverything is an effing DNS Problem● Building Clouds since before the bookstoreBuilding Clouds since before the bookstore● Some books, some papers, some blogsSome books, some papers, some blogs● Evangelizing devopsEvangelizing devops

Page 3: 7 puppet horror stories in 7 years - PuppetConf 2014

Setting the StageSetting the Stage● [root@xen ~]# rpm -qa | grep puppet[root@xen ~]# rpm -qa | grep puppet

puppet-0.23.2-1.el5puppet-0.23.2-1.el5

● -rwxr-xr-x 1 root root 4809 Aug 22 2007 -rwxr-xr-x 1 root root 4809 Aug 22 2007 /usr/bin/puppet/usr/bin/puppet

● Consulting @ different customers.Consulting @ different customers.

● Telling my war stories, so you don't have to Telling my war stories, so you don't have to

Page 4: 7 puppet horror stories in 7 years - PuppetConf 2014

Debugging InfrastructuresDebugging InfrastructuresEverything is a Funky DNS Problem

No really, Everything is a Funky DNS ProblemIf it's not a funky DNS Problem ..

It's an arp problemIf it's not an arp problem...

It's a Full Filesystem ProblemIf your filesystem isn't full

It's a Spanning Tree problemIf it's not a spanning Tree problem...

It's a USB problemIf it's not a USB Problem

It might be an ntp problem If it's not an ntp problemIt's a sharing IRQ Problem

If it's not a sharing IRQ ProblemBut most often .. its a Freaking Dns Problem !

Or someone playing tricks on you

Jan 2006

Page 5: 7 puppet horror stories in 7 years - PuppetConf 2014

Chapter 1: Deploying a Puppetmaster Chapter 1: Deploying a Puppetmaster

Page 6: 7 puppet horror stories in 7 years - PuppetConf 2014

Chicken and EggsChicken and Eggs● Platform = Rack fullPlatform = Rack full

● 2x pxe, yum, dhcpd, puppetmaster2x pxe, yum, dhcpd, puppetmaster

● Reinstalling the platform Reinstalling the platform

• Scratch the central box Scratch the central box

• Scratch the other boxenScratch the other boxen

Page 7: 7 puppet horror stories in 7 years - PuppetConf 2014

Works in Dev, Fails in ProdWorks in Dev, Fails in ProdProd = shipped platform on other continentProd = shipped platform on other continent

● Platform works, lets redeploy.Platform works, lets redeploy.

● Take down PuppetMaster.Take down PuppetMaster.

● RebootstrapRebootstrap

● Boostrap first couple of nodes, successBoostrap first couple of nodes, success

● More nodes.. failure start happeningMore nodes.. failure start happening

Page 8: 7 puppet horror stories in 7 years - PuppetConf 2014

It's not your codeIt's not your code● Strip out code,Strip out code,

● Runs work Runs work sometimessometimes on almost empty on almost empty catalogscatalogs

● Reenable codeReenable code

● Run server in debug, problem gets worseRun server in debug, problem gets worse

About to rename the puppetmaster to About to rename the puppetmaster to schrodingerschrodinger

Page 9: 7 puppet horror stories in 7 years - PuppetConf 2014

24 hours and a VPN Failure later24 hours and a VPN Failure later

Page 10: 7 puppet horror stories in 7 years - PuppetConf 2014

The Crulpit :The Crulpit :● Partial failing Managed PowersupplyPartial failing Managed Powersupply

● 6 boxen were not powered off6 boxen were not powered off

● Usually we powered off all the other boxenUsually we powered off all the other boxen

● Breaking webrick , Breaking webrick ,

● With old ssl requestsWith old ssl requests

Page 11: 7 puppet horror stories in 7 years - PuppetConf 2014

Rogue SSL QueriesRogue SSL Queries● Apparently I blogged Apparently I blogged

about the first one in about the first one in april 2011.... but it april 2011.... but it feels like 7 years agofeels like 7 years ago

http://www.krisbuytaert.http://www.krisbuytaert.be/blog/24-hours-be/blog/24-hours-puppet-dramapuppet-drama

x

Page 12: 7 puppet horror stories in 7 years - PuppetConf 2014

Chapter 2: Honour your parents or your disks will Chapter 2: Honour your parents or your disks will floodflood

Page 13: 7 puppet horror stories in 7 years - PuppetConf 2014

#MonitoringSucks#MonitoringSucks● Puppetruns break our Icinga boxenPuppetruns break our Icinga boxen

● BadlyBadly

● FrequentlyFrequently

Page 14: 7 puppet horror stories in 7 years - PuppetConf 2014

Stored ConfigsStored Configs

Page 15: 7 puppet horror stories in 7 years - PuppetConf 2014

Exporting and Collecting Exporting and Collecting

Page 16: 7 puppet horror stories in 7 years - PuppetConf 2014

It ain't borkenIt ain't borken● Successful puppet runSuccessful puppet run

● Successful Icinga reconfigureSuccessful Icinga reconfigure

● Disk usage growsDisk usage grows

● FastFast

Page 17: 7 puppet horror stories in 7 years - PuppetConf 2014
Page 18: 7 puppet horror stories in 7 years - PuppetConf 2014

A Puppet BugA Puppet Bug

Page 19: 7 puppet horror stories in 7 years - PuppetConf 2014

Chapter 3: Release Management Chapter 3: Release Management

Page 20: 7 puppet horror stories in 7 years - PuppetConf 2014

It works, it doesn't workIt works, it doesn't work● Imagine a working puppet setupImagine a working puppet setup

● At different customersAt different customers

● Upstream Vendor releases new softwareUpstream Vendor releases new software

● Half of the customers call crying that their Half of the customers call crying that their platform is brokenplatform is broken

Page 21: 7 puppet horror stories in 7 years - PuppetConf 2014

A broken mcollectiveA broken mcollective● Customers with self managed (package) repos Customers with self managed (package) repos

are happyare happy

● Customers using upstream repos are in painCustomers using upstream repos are in pain

Page 22: 7 puppet horror stories in 7 years - PuppetConf 2014

Repository ManagementRepository Management

● PulpPulp

• Pro : MirroringLovePro : MirroringLove

• Con : Mongo, Stability, .deb, forgeCon : Mongo, Stability, .deb, forge

● PRM PRM

● Yum Repo Server by IS24Yum Repo Server by IS24

Page 23: 7 puppet horror stories in 7 years - PuppetConf 2014

Repository ManagementRepository Management

Page 24: 7 puppet horror stories in 7 years - PuppetConf 2014

Version vs LatestVersion vs Latest● Version your repos ?Version your repos ?

ensure => latestsensure => latests

● Latest your environments ?Latest your environments ?

● Strict versioning in config ?Strict versioning in config ?

Ensure => '0.98.4'Ensure => '0.98.4'

Use HieraUse Hiera

Page 25: 7 puppet horror stories in 7 years - PuppetConf 2014

Chapter 4: We are all devs nowChapter 4: We are all devs now

Page 26: 7 puppet horror stories in 7 years - PuppetConf 2014

No more puppet runs :(No more puppet runs :(● Puppet is cronnedPuppet is cronned

● Dashboard shows no successful runs for hoursDashboard shows no successful runs for hours

● puppet agent -t starts , then exits with no errorpuppet agent -t starts , then exits with no error

Page 27: 7 puppet horror stories in 7 years - PuppetConf 2014

We didn't change a thing !We didn't change a thing !● Quick git log showed no relevant changes on Quick git log showed no relevant changes on

the code base.the code base.

Page 28: 7 puppet horror stories in 7 years - PuppetConf 2014

Let's debugLet's debug● Put agent in verbosePut agent in verbose

● Put master in verbosePut master in verbose

Page 29: 7 puppet horror stories in 7 years - PuppetConf 2014

Let's debugLet's debug● Put agent in verbosePut agent in verbose

● Put master in verbosePut master in verbose

● Try different environmentsTry different environments

Page 30: 7 puppet horror stories in 7 years - PuppetConf 2014

Let's debugLet's debug● Put agent in verbosePut agent in verbose

● Put master in verbosePut master in verbose

● Try different environments (it works in 1)Try different environments (it works in 1)

● Upgrade puppetmasterUpgrade puppetmaster

● Upgrade puppet agentUpgrade puppet agent

Page 31: 7 puppet horror stories in 7 years - PuppetConf 2014

Let's debugLet's debug● Put agent in verbosePut agent in verbose

● Put master in verbosePut master in verbose

● Try different environmentsTry different environments

● Upgrade puppetmasterUpgrade puppetmaster

● Upgrade puppet agentUpgrade puppet agent

● Modify /etc/hostsModify /etc/hosts

Page 32: 7 puppet horror stories in 7 years - PuppetConf 2014

Let's debugLet's debug● Put agent in verbosePut agent in verbose

● Put master in verbosePut master in verbose

● Try different environmentsTry different environments

● Upgrade puppetmasterUpgrade puppetmaster

● Upgrade puppet agentUpgrade puppet agent

● Modify /etc/hostsModify /etc/hosts

● Clean SSL CertsClean SSL Certs

Page 33: 7 puppet horror stories in 7 years - PuppetConf 2014

Called in the troopsCalled in the troops● I've never seen something like this beforeI've never seen something like this before

● They couldn't find a thing either ..They couldn't find a thing either ..

● We've never seen something like this before.We've never seen something like this before.

Page 34: 7 puppet horror stories in 7 years - PuppetConf 2014

We didn't change a thing (2)!We didn't change a thing (2)!

● Quick git log showed no relevant changes on Quick git log showed no relevant changes on the code base.the code base.

● On the manifests included in the platform On the manifests included in the platform where the code was failing where the code was failing

● 95% Linux nodes failing, no code change95% Linux nodes failing, no code change

Page 35: 7 puppet horror stories in 7 years - PuppetConf 2014

Exit ZeroExit Zero

● 5% Windows nodes, code changes ignored5% Windows nodes, code changes ignored

=> that module isn't used on Linux right ?=> that module isn't used on Linux right ?

Page 36: 7 puppet horror stories in 7 years - PuppetConf 2014

Chapter 5: Software defined NetworkChapter 5: Software defined Network

Page 37: 7 puppet horror stories in 7 years - PuppetConf 2014

Business as usualBusiness as usual● Fully puppetized Fully puppetized

platform platform

● Deployed 5+ Deployed 5+ instances of platform instances of platform

● IPSec Tunnels in all IPSec Tunnels in all directions . directions .

Page 38: 7 puppet horror stories in 7 years - PuppetConf 2014

One MorningOne Morning● Puppetruns start failingPuppetruns start failing

● Perfect access to new customer platformPerfect access to new customer platform

● Monitoring says everything is fine.Monitoring says everything is fine.

● Boxen can ping puppetmasterBoxen can ping puppetmaster

● Puppetmaster can ping boxenPuppetmaster can ping boxen

Page 39: 7 puppet horror stories in 7 years - PuppetConf 2014

Everything is a Funky DNS Everything is a Funky DNS ProblemProblem

● Reverse DNS was brokenReverse DNS was broken

● Fixed Reverse DNS for new platformFixed Reverse DNS for new platform

● Hey .. this looks betterHey .. this looks better

Page 40: 7 puppet horror stories in 7 years - PuppetConf 2014

But it's not fixed yetBut it's not fixed yet● It's ssl ?It's ssl ?

● Let's clear some Certs.Let's clear some Certs.

● Nope not SSLNope not SSL

Page 41: 7 puppet horror stories in 7 years - PuppetConf 2014

I can connect !I can connect !● I can ping the puppetmasterI can ping the puppetmaster

● I can telnet to port 8140 on the puppetmasterI can telnet to port 8140 on the puppetmaster

● The puppetmaster can ping meThe puppetmaster can ping me

● It all worksIt all works

Page 42: 7 puppet horror stories in 7 years - PuppetConf 2014

But But ● I can't wget from the puppetmasterI can't wget from the puppetmaster

● It must be networkingIt must be networking

● Is it a nat rule on the vpn ? Is it a nat rule on the vpn ?

Page 43: 7 puppet horror stories in 7 years - PuppetConf 2014

Have you tried turning it of Have you tried turning it of and on again ?and on again ?

● Broken IPSec setupBroken IPSec setup

● Restarting tunnels Restarting tunnels solved it.solved it.

Page 44: 7 puppet horror stories in 7 years - PuppetConf 2014

Chapter 7: There's 3 hard things in ITChapter 7: There's 3 hard things in IT

- Cache Invalidation- Cache Invalidation

- Shared Memory Management- Shared Memory Management

- Off by 1 errors- Off by 1 errors

- Naming things- Naming things

Page 45: 7 puppet horror stories in 7 years - PuppetConf 2014

Yet another broken puppetrunYet another broken puppetrun

err: Could not retrieve catalog from remote err: Could not retrieve catalog from remote server: Error 400 on SERVER: Another local or server: Error 400 on SERVER: Another local or imported resource exists with the type and title imported resource exists with the type and title Apache::Vhost[somehost.somecusterom.eu_80_pApache::Vhost[somehost.somecusterom.eu_80_proxy] on node proxy.somewhere.euroxy] on node proxy.somewhere.eu

warning: Not using cache on failed catalogwarning: Not using cache on failed catalog

Page 46: 7 puppet horror stories in 7 years - PuppetConf 2014

But that shouldn't be thereBut that shouldn't be there

● Vhost does not get exported to thereVhost does not get exported to there

● Box should not collect thatBox should not collect that

Page 47: 7 puppet horror stories in 7 years - PuppetConf 2014

But that shouldn't be thereBut that shouldn't be there

● Vhost does not get exported to thereVhost does not get exported to there

● Box should not collect thatBox should not collect that

● Digging in to exportsDigging in to exports

● Digging into puppetdb , hows your psql ?Digging into puppetdb , hows your psql ?

Page 48: 7 puppet horror stories in 7 years - PuppetConf 2014

CollectingCollectingApache::Vhost <<| tag == 'proxy' |>> {}Apache::Vhost <<| tag == 'proxy' |>> {}

Apache::Vhost::Ssl <<| tag == 'proxy' |>> {}Apache::Vhost::Ssl <<| tag == 'proxy' |>> {}

Apache::Vhost::Mod::Reverse_proxy <<| tag == Apache::Vhost::Mod::Reverse_proxy <<| tag == 'proxy' |>> {}'proxy' |>> {}

Page 49: 7 puppet horror stories in 7 years - PuppetConf 2014

Beware what you collect!Beware what you collect!● profiles::vhost::proxyprofiles::vhost::proxy

Page 50: 7 puppet horror stories in 7 years - PuppetConf 2014

Lessons LearnedLessons Learned● Repository Management is ImportantRepository Management is Important

● Development environments are relevant, also Development environments are relevant, also for puppet code !for puppet code !

● We all hate SSLWe all hate SSL

● You never monitor enoughYou never monitor enough

● Upgrading is mostly not the solutionUpgrading is mostly not the solution

● Unless it's a bug in PuppetUnless it's a bug in Puppet

● Release Management is not a solved problemRelease Management is not a solved problem

Page 51: 7 puppet horror stories in 7 years - PuppetConf 2014
Page 52: 7 puppet horror stories in 7 years - PuppetConf 2014

ContactContactKris Buytaert Kris Buytaert [email protected]@inuits.be

Further ReadingFurther Reading@krisbuytaert @krisbuytaert http://www.krisbuytaert.be/bhttp://www.krisbuytaert.be/blog/log/http://www.inuits.be/http://www.inuits.be/

InuitsInuits

Duboistraat 50Duboistraat 502060 Antwerpen2060 AntwerpenBelgiumBelgium891.514.231891.514.231

+32 475 961221+32 475 961221