troubleshooting the puppet enterprise stack
DESCRIPTION
A guide through where to look for errors when they happen in the various parts of Puppet Enterprise ( the console, Live Management, puppet master, Activemq, MCollective, agent), what some of those errors mean, and what warnings and errors are red herrings/normally occurring. Celia Cottle Support Engineer, Puppet Labs Celia Cottle is a Support Engineer at Puppet Labs, where she troubleshoots and resolves issues for Puppet Enterprise customers. She comes from Portland State University, where she worked for the College of Engineering and Computer Science doing technical support, while getting her degree in Communication. She’s been working in IT for over five years and enjoys problem solving, working with a wide range of OSes and software, and the variety of challenges that supporting Puppet Enterprise brings. She currently resides in Portland, Oregon.TRANSCRIPT
Troubleshooting Puppet Enterprise
Celia Cottle Support Engineer | Puppet Labs [email protected] @celiaPDX
The Stack Console The console is Puppet Enterprise’s web GUI.
Mcollective/Live Management LM is an interface to PE’s orchestration engine (Mcollective).
PuppetDB PuppetDB collects data generated by Puppet.
Master/Agent The central puppet server/ Retrieves the client configuration
from the puppet master and applies it to the local host
The Console
Console Logs /var/log/pe-httpd/puppetdashboard.error.log /var/log/pe-httpd/puppetdashboard.access.log /var/log/pe-httpd/puppetmaster.error.log
Configuration /etc/puppetlabs/puppet/puppet.conf
No nodes are reporting
Console Common Problems
• Stop the pe-puppet-dashboard-workers
• Check opt/puppet/share/puppet-dashboard/tmp/pids for files ending in .pid.
• Restart the pe-puppet-dashboard-workers.
• Run ps aux | grep delayed_job and see if entries like dashboard/delayed_job.1 and
delayed_job.1_monitor appear. If they are, that means the dashboard has started
up properly again.
Console Common Problems
There’s No Facts Listed For Nodes /Node Manager Won’t Display
/var/log/pe-httpd/puppetmaster.error.log [Fri Aug 16 22:49:20 2013] [error] [client 172.16.0.2]
Certificate Verification: Error (23): certificate revoked
Console Authentication Logs /var/log/pe-httpd/access.log /var/log/pe-httpd/error.log /var/log/pe-console-auth/
cas.log
Configuration Files /etc/puppetlabs/console-auth/cas_client_config.yml /etc/puppetlabs/rubycas-server/config.yml
Console Auth Common Problems
Can’t Log In /var/log/pe-console-auth/cas.log: Invalid credentials given for user '[email protected]' Possible Cause: Bad Credentials/Lost Credentials
$ cd /opt/puppet/share/console-auth $ sudo /opt/puppet/bin/rake db:create_user USERNAME="[email protected]" PASSWORD="<password>" ROLE="Admin”
Alternatively, if using 3rd Party Auth: /var/log/pe-httpd/access.log
PuppetDB
PuppetDB
Log Files: /var/log/messages /var/log/pe-puppetdb/puppetdb.log
Config Files: /etc/puppetlabs/puppet/puppetdb.conf
PuppetDB Common Problems
SSL Errors * /var/log/messages Error: Could not retrieve catalog from remote
server: Error 400 on SERVER: Failed to submit 'replace facts' command for agent1.vm to PuppetDB at master0.vm:8081: Server hostname 'master0.vm' did not match server certificate; expected one of master1.vm
Puppetdb Common Problems
PuppetDB Won’t Start, Fails Silently /var/log/pe-puppetdb/puppetdb.log ***/var/log/pe-puppetdb/puppetdb-oom.hprof
java.lang.OutOfMemoryError: Java heap space Fix:
Edit the defaults in /etc/default/pe-puppetdb or /etc/sysconfig/pe-puppetdb, and change the 256m to 1024m
JAVA_ARGS="-Xmx256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pe-puppetdb/puppetdb-oom.hprof -Xms256m"
Live Management
Live Management /Mcollective
Logs: /var/log/pe-activemq/activemq.log /var/log/pe-mcollective/mcollective.log /var/log/pe-httpd/error.log
Configuration: /etc/puppetlabs/mcollective/server.cfg
Mcollective Common Problems
* None of the Nodes Show Up In Live Management /var/log/pe-httpd/error.log No MCollective servers responded. Either
MCollective is not yet configured and operational or all MCollective servers are off-‐line. Check that you can reach your servers with `mco ping`. It may also help to increase the LM_DISCOVERY_TIMEOUT or LM_INVENTORY_RETRIES variables in your Apache configuration.
Live Management
Common Problems And What They Look Like * None of the Nodes Show Up In Live Management /var/log/pe-activemq/activemq.log | WARN | Transport Connection to: tcp://000.00.000.00:0000
failed: java.lang.SecurityException: User name [mcollective] or password is invalid.
Mcollective Common Problems
* The Number of Nodes reporting from
MCollective commands, or Live Management, varies
/var/log/pe-activemq/activemq.log javax.net.ssl.SSLHandshakeException: Remote host
closed connection during handshake Solution: On the master, edit: /opt/puppet/share/puppet/modules/pe_mcollective/server.cfg.erb
and edit the line registerinterval =
Live Management Common Problems And What They Look Like
* Nothing displays but a 500 error
Master/Agent
Logs: * /var/log/messages * /var/log/pe-httpd/error.log
Configuration: /etc/puppetlabs/puppet/puppet.conf
Master/Agent Common Problems And What They Look Like
* Nodes are failing runs /var/log/messages err: /File[/var/opt/lib/pe-puppet/lib]: Failed to generate
additional resources using 'eval_generate: Connection timed out - connect(2) err: Could not retrieve plugin: execution expired
Solution: Splay: http://docs.puppetlabs.com/references/latest/configuration.html#splay
Master/Agent Common Problems And What They Look Like
* Nodes are failing runs var/log/messages Error: Could not request certificate: The certificate retrieved
from the master does not match the agent's private key.
To fix this, remove the certificate from both the master and the agent and then start a puppet run, which will automatically regenerate a certficate.
On the master:
puppet cert clean agentname
Restart pe-‐httpd
On the agent:
rm -‐f /etc/puppetlabs/puppet/ssl/certs/agentname
puppet agent -‐t
Master/Agent Common Problems And What They Look Like
* Nodes can’t reach the master Error: Could not request certificate: getaddrinfo:
Name or service not known
Troubleshooting 1. telnet master 8140 2. Check /etc/hosts or DNS 3. ping master
Red Herrings /var/log/pe-httpd/error.log config.ru:9: warning: already initialized
constant argv
var/log/pe-httpd/puppetdashboard.error.log [warn] RSA server certificate CommonName (CN)
`pe-‐internal-‐dashboard' does NOT match server name!?
/var/log/pe-console-auth/auth.log INFO 2013-‐08-‐20 01:07 UTC: User (anonymous)
accessed read-‐write url /reports/upload
SSL Errors
Where your certs (mostly) live: /etc/puppetlabs/puppet/ssl /opt/puppet/share/puppet-dashboard/certs /etc/puppetlabs/puppetdb/ssl
Regenerating The CA And The Master
1. Delete the contents of /etc/puppetlabs/puppet/ssl directory on the master.
2. Run `puppet cert list` to regenerate the CA. 3. Stop pe-httpd. 4. Run `puppet master --no-daemonize --verbose` to regenerate the
master cert and create a cert request. 5. Check that ‘puppet cert list -a’ returned the master cert. 6. Restart pe-httpd.
Regenerating the PuppetDB Certs 1. Stop the PuppetDB service
2. Remove agent certs from/etc/puppetlabs/puppet/ssl/ if on a separate server and the PuppetDB ones from /etc/puppetlabs/puppetdb/ssl/
3. Run `puppet cert clean puppetdbhost.yourdomain` on the master (if not cleaned already and on a separate host)
4. Regenerate the Puppet Agent certs by performing a Puppet run on the PuppetDB, signing them on the master if necessary.
5. Run /opt/puppet/sbin/puppetdb-ssl-setup -f on thePuppetDB host.
6. Restart the PuppetDB service on its host, and the pe-httpd service on your master.
Regenerating The Console’s Certificate
1. cd /opt/puppet/share/puppet-dashboard/certs, and remove any existing contents. 2. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:create_key_pair 3. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:request 4. sudo puppet cert sign pe-internal-dashboard 5. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:retrieve 6. sudo chown -R puppet-dashboard:puppet-dashboard certs/ 7. /etc/init.d/pe-httpd restart
Regenerating The Agent’s Certificate
On the master: 1. puppet cert clean agenthostname 2. Restart pe-httpd
On the agent: 1.rm -rf /etc/puppetlabs/puppet/ssl 2. puppet agent -t
On the master: 1. puppet cert sign agenthostname
Regenerating Your Master’s Certificate
1. Edit your puppet.conf to update any changes to the hostname or alt names.
2. `puppet cert clean mastername` 3. Stop pe-httpd(/etc/init.d/pe-‐httpd stop).
4. Run `puppet master --no-daemonize --verbose’.
Certs that Puppet can Regenerate
pe-internal-broker pe-internal-mcollective-servers pe-internal-peadmin-mcollective-client pe-internal-puppet-console-mcollective-client
Regenerating All The Certificates
http://showterm.io/f41a4b7bb5b0b006d8a80
Q&A
Resources
Ask.Puppetlabs.com
Irc.freenode.net #puppet
PE-Users Mailing List: https://groups.google.com/a/puppetlabs.com/
group/pe-users/topics