troubleshooting the puppet enterprise stack

Troubleshooting Puppet Enterprise

Celia Cottle Support Engineer | Puppet Labs celia@puppetlabs.com @celiaPDX

The Stack Console The console is Puppet Enterprise’s web GUI.

Mcollective/Live Management LM is an interface to PE’s orchestration engine (Mcollective).

PuppetDB PuppetDB collects data generated by Puppet.

Master/Agent The central puppet server/ Retrieves the client configuration

from the puppet master and applies it to the local host

The Console

Console Logs /var/log/pe-httpd/puppetdashboard.error.log /var/log/pe-httpd/puppetdashboard.access.log /var/log/pe-httpd/puppetmaster.error.log

Configuration /etc/puppetlabs/puppet/puppet.conf

No nodes are reporting

Console Common Problems

•  Stop the pe-puppet-dashboard-workers

•  Check opt/puppet/share/puppet-dashboard/tmp/pids for files ending in .pid.

•  Restart the pe-puppet-dashboard-workers.

•  Run ps aux | grep delayed_job and see if entries like dashboard/delayed_job.1 and

delayed_job.1_monitor appear. If they are, that means the dashboard has started

up properly again.

Console Common Problems

There’s No Facts Listed For Nodes /Node Manager Won’t Display

/var/log/pe-httpd/puppetmaster.error.log [Fri Aug 16 22:49:20 2013] [error] [client 172.16.0.2]

Certificate Verification: Error (23): certificate revoked

Console Authentication Logs /var/log/pe-httpd/access.log /var/log/pe-httpd/error.log /var/log/pe-console-auth/

cas.log

Configuration Files /etc/puppetlabs/console-auth/cas_client_config.yml /etc/puppetlabs/rubycas-server/config.yml

Console Auth Common Problems

Can’t Log In /var/log/pe-console-auth/cas.log: Invalid credentials given for user 'console@puppetlabs.test' Possible Cause: Bad Credentials/Lost Credentials

$ cd /opt/puppet/share/console-auth $ sudo /opt/puppet/bin/rake db:create_user USERNAME="adminuser@example.com" PASSWORD="<password>" ROLE="Admin”

Alternatively, if using 3rd Party Auth: /var/log/pe-httpd/access.log

PuppetDB

Log Files: /var/log/messages /var/log/pe-puppetdb/puppetdb.log

Config Files: /etc/puppetlabs/puppet/puppetdb.conf

PuppetDB Common Problems

SSL Errors * /var/log/messages Error: Could not retrieve catalog from remote

server: Error 400 on SERVER: Failed to submit 'replace facts' command for agent1.vm to PuppetDB at master0.vm:8081: Server hostname 'master0.vm' did not match server certificate; expected one of master1.vm

Puppetdb Common Problems

PuppetDB Won’t Start, Fails Silently /var/log/pe-puppetdb/puppetdb.log ***/var/log/pe-puppetdb/puppetdb-oom.hprof

java.lang.OutOfMemoryError: Java heap space Fix:

Edit the defaults in /etc/default/pe-puppetdb or /etc/sysconfig/pe-puppetdb, and change the 256m to 1024m

JAVA_ARGS="-Xmx256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pe-puppetdb/puppetdb-oom.hprof -Xms256m"

Live Management

Live Management /Mcollective

Logs: /var/log/pe-activemq/activemq.log /var/log/pe-mcollective/mcollective.log /var/log/pe-httpd/error.log

Configuration: /etc/puppetlabs/mcollective/server.cfg

Mcollective Common Problems

* None of the Nodes Show Up In Live Management /var/log/pe-httpd/error.log No MCollective servers responded. Either

MCollective is not yet configured and operational or all MCollective servers are off-‐line. Check that you can reach your servers with `mco ping`. It may also help to increase the LM_DISCOVERY_TIMEOUT or LM_INVENTORY_RETRIES variables in your Apache configuration.

Live Management

Common Problems And What They Look Like * None of the Nodes Show Up In Live Management /var/log/pe-activemq/activemq.log | WARN | Transport Connection to: tcp://000.00.000.00:0000

failed: java.lang.SecurityException: User name [mcollective] or password is invalid.

Mcollective Common Problems

* The Number of Nodes reporting from

MCollective commands, or Live Management, varies

/var/log/pe-activemq/activemq.log javax.net.ssl.SSLHandshakeException: Remote host

closed connection during handshake Solution: On the master, edit: /opt/puppet/share/puppet/modules/pe_mcollective/server.cfg.erb

and edit the line registerinterval =

Live Management Common Problems And What They Look Like

* Nothing displays but a 500 error

Master/Agent

Logs: * /var/log/messages * /var/log/pe-httpd/error.log

Configuration: /etc/puppetlabs/puppet/puppet.conf

Master/Agent Common Problems And What They Look Like

* Nodes are failing runs /var/log/messages err: /File[/var/opt/lib/pe-puppet/lib]: Failed to generate

additional resources using 'eval_generate: Connection timed out - connect(2) err: Could not retrieve plugin: execution expired

Solution: Splay: http://docs.puppetlabs.com/references/latest/configuration.html#splay

* Nodes are failing runs var/log/messages Error: Could not request certificate: The certificate retrieved

from the master does not match the agent's private key.

To fix this, remove the certificate from both the master and the agent and then start a puppet run, which will automatically regenerate a certficate.

On the master:

puppet cert clean agentname

Restart pe-‐httpd

On the agent:

rm -‐f /etc/puppetlabs/puppet/ssl/certs/agentname

puppet agent -‐t

* Nodes can’t reach the master Error: Could not request certificate: getaddrinfo:

Name or service not known

Troubleshooting 1. telnet master 8140 2. Check /etc/hosts or DNS 3. ping master

Red Herrings /var/log/pe-httpd/error.log config.ru:9: warning: already initialized

constant argv

var/log/pe-httpd/puppetdashboard.error.log [warn] RSA server certificate CommonName (CN)

`pe-‐internal-‐dashboard' does NOT match server name!?

/var/log/pe-console-auth/auth.log INFO 2013-‐08-‐20 01:07 UTC: User (anonymous)

accessed read-‐write url /reports/upload

SSL Errors

Where your certs (mostly) live: /etc/puppetlabs/puppet/ssl /opt/puppet/share/puppet-dashboard/certs /etc/puppetlabs/puppetdb/ssl

Regenerating The CA And The Master

1. Delete the contents of /etc/puppetlabs/puppet/ssl directory on the master.

2. Run `puppet cert list` to regenerate the CA. 3. Stop pe-httpd. 4. Run `puppet master --no-daemonize --verbose` to regenerate the

master cert and create a cert request. 5. Check that ‘puppet cert list -a’ returned the master cert. 6. Restart pe-httpd.

Regenerating the PuppetDB Certs 1. Stop the PuppetDB service

2. Remove agent certs from/etc/puppetlabs/puppet/ssl/ if on a separate server and the PuppetDB ones from /etc/puppetlabs/puppetdb/ssl/

3. Run `puppet cert clean puppetdbhost.yourdomain` on the master (if not cleaned already and on a separate host)

4. Regenerate the Puppet Agent certs by performing a Puppet run on the PuppetDB, signing them on the master if necessary.

5. Run /opt/puppet/sbin/puppetdb-ssl-setup -f on thePuppetDB host.

6. Restart the PuppetDB service on its host, and the pe-httpd service on your master.

Regenerating The Console’s Certificate

1. cd /opt/puppet/share/puppet-dashboard/certs, and remove any existing contents. 2. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:create_key_pair 3. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:request 4. sudo puppet cert sign pe-internal-dashboard 5. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:retrieve 6. sudo chown -R puppet-dashboard:puppet-dashboard certs/ 7. /etc/init.d/pe-httpd restart

Regenerating The Agent’s Certificate

On the master: 1. puppet cert clean agenthostname 2. Restart pe-httpd

On the agent: 1.rm -rf /etc/puppetlabs/puppet/ssl 2. puppet agent -t

On the master: 1. puppet cert sign agenthostname

Regenerating Your Master’s Certificate

1. Edit your puppet.conf to update any changes to the hostname or alt names.

2. `puppet cert clean mastername` 3. Stop pe-httpd(/etc/init.d/pe-‐httpd stop).

4. Run `puppet master --no-daemonize --verbose’.

Certs that Puppet can Regenerate

pe-internal-broker pe-internal-mcollective-servers pe-internal-peadmin-mcollective-client pe-internal-puppet-console-mcollective-client

Regenerating All The Certificates

http://showterm.io/f41a4b7bb5b0b006d8a80

Resources

Ask.Puppetlabs.com

Irc.freenode.net #puppet

PE-Users Mailing List: https://groups.google.com/a/puppetlabs.com/

group/pe-users/topics

troubleshooting the puppet enterprise stack

mcollective common problems

masteragent common problems

console auth common

puppetdb puppetdb

console common problems

puppet master

console logs

stack console

Technology

state of puppet 2013 - puppet camp dc

puppet camp berlin 2014: advanced puppet design

printable-puppet-dinosaurs - paging supermom · triceratops...

dead puppet society argus presenter pack - artour | home...

full-stack troubleshooting: wavefront automated...

puppetconf 2016: direct puppet and application management...

puppet camp london 2015: puppet contained

oracle application tech stack tips and queries for...

troubleshooting - · pdf filesitrain training for automation...

deploying e.l.k stack w puppet

puppet camp nyc 2014: safely storing secrets and credentials...

h.323 protocol stack -...

puppetconf 2016: puppet & azure – kenaz kwa, puppet

puppet camp berlin 2015: puppet keynote

puppet camp berlin 2015: puppet demo (beginner)

puppet at twitter - puppet camp silicon valley

stack traces and flame graphs for oracle troubleshooting...

troubleshooting storage performance - wordpress.com ·...

troubleshooting open stack

building a deployment pipeline - cloudbees...ruby developer...