Transcript
Page 1: Troubleshooting the Puppet Enterprise Stack

Troubleshooting Puppet Enterprise

Celia Cottle Support Engineer | Puppet Labs [email protected] @celiaPDX

Page 2: Troubleshooting the Puppet Enterprise Stack

The Stack Console The console is Puppet Enterprise’s web GUI.

Mcollective/Live Management LM is an interface to PE’s orchestration engine (Mcollective).

PuppetDB PuppetDB collects data generated by Puppet.

Master/Agent The central puppet server/ Retrieves the client configuration

from the puppet master and applies it to the local host

Page 3: Troubleshooting the Puppet Enterprise Stack

The Console

Page 4: Troubleshooting the Puppet Enterprise Stack

Console Logs /var/log/pe-httpd/puppetdashboard.error.log /var/log/pe-httpd/puppetdashboard.access.log /var/log/pe-httpd/puppetmaster.error.log

Configuration /etc/puppetlabs/puppet/puppet.conf

Page 5: Troubleshooting the Puppet Enterprise Stack

No nodes are reporting

Console Common Problems

•  Stop the pe-puppet-dashboard-workers

•  Check opt/puppet/share/puppet-dashboard/tmp/pids for files ending in .pid.

•  Restart the pe-puppet-dashboard-workers.

•  Run ps aux | grep delayed_job and see if entries like dashboard/delayed_job.1 and

delayed_job.1_monitor appear. If they are, that means the dashboard has started

up properly again.

Page 6: Troubleshooting the Puppet Enterprise Stack

Console Common Problems

There’s No Facts Listed For Nodes /Node Manager Won’t Display

/var/log/pe-httpd/puppetmaster.error.log [Fri  Aug  16  22:49:20  2013]  [error]  [client  172.16.0.2]  

Certificate  Verification:  Error  (23):  certificate  revoked  

Page 7: Troubleshooting the Puppet Enterprise Stack

Console Authentication Logs /var/log/pe-httpd/access.log /var/log/pe-httpd/error.log /var/log/pe-console-auth/

cas.log

Configuration Files /etc/puppetlabs/console-auth/cas_client_config.yml /etc/puppetlabs/rubycas-server/config.yml

Page 8: Troubleshooting the Puppet Enterprise Stack

Console Auth Common Problems

Can’t Log In /var/log/pe-console-auth/cas.log: Invalid credentials given for user '[email protected]' Possible Cause: Bad Credentials/Lost Credentials

$ cd /opt/puppet/share/console-auth $ sudo /opt/puppet/bin/rake db:create_user USERNAME="[email protected]" PASSWORD="<password>" ROLE="Admin”

Alternatively, if using 3rd Party Auth: /var/log/pe-httpd/access.log

Page 9: Troubleshooting the Puppet Enterprise Stack

PuppetDB

Page 10: Troubleshooting the Puppet Enterprise Stack

PuppetDB

Log Files: /var/log/messages /var/log/pe-puppetdb/puppetdb.log

Config Files: /etc/puppetlabs/puppet/puppetdb.conf

Page 11: Troubleshooting the Puppet Enterprise Stack

PuppetDB Common Problems

SSL Errors * /var/log/messages Error:  Could  not  retrieve  catalog  from  remote  

server:  Error  400  on  SERVER:  Failed  to  submit  'replace  facts'  command  for  agent1.vm  to  PuppetDB  at  master0.vm:8081:  Server  hostname  'master0.vm'  did  not  match  server  certificate;  expected  one  of  master1.vm  

Page 12: Troubleshooting the Puppet Enterprise Stack

Puppetdb Common Problems

PuppetDB Won’t Start, Fails Silently /var/log/pe-puppetdb/puppetdb.log ***/var/log/pe-puppetdb/puppetdb-oom.hprof

java.lang.OutOfMemoryError:  Java  heap  space  Fix:

Edit the defaults in /etc/default/pe-puppetdb or /etc/sysconfig/pe-puppetdb, and change the 256m to 1024m

JAVA_ARGS="-Xmx256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pe-puppetdb/puppetdb-oom.hprof -Xms256m"

Page 13: Troubleshooting the Puppet Enterprise Stack

Live Management

Page 14: Troubleshooting the Puppet Enterprise Stack

Live Management /Mcollective

Logs: /var/log/pe-activemq/activemq.log /var/log/pe-mcollective/mcollective.log /var/log/pe-httpd/error.log

Configuration: /etc/puppetlabs/mcollective/server.cfg

Page 15: Troubleshooting the Puppet Enterprise Stack

Mcollective Common Problems

* None of the Nodes Show Up In Live Management /var/log/pe-httpd/error.log  No  MCollective  servers  responded.  Either  

MCollective  is  not  yet  configured  and  operational  or  all  MCollective  servers  are  off-­‐line.  Check  that  you  can  reach  your  servers  with  `mco  ping`.  It  may  also  help  to  increase  the  LM_DISCOVERY_TIMEOUT  or  LM_INVENTORY_RETRIES  variables  in  your  Apache  configuration.  

Page 16: Troubleshooting the Puppet Enterprise Stack

Live Management

Common Problems And What They Look Like * None of the Nodes Show Up In Live Management /var/log/pe-activemq/activemq.log |  WARN  |  Transport  Connection  to:  tcp://000.00.000.00:0000  

failed:  java.lang.SecurityException:  User  name  [mcollective]  or  password  is  invalid.  

Page 17: Troubleshooting the Puppet Enterprise Stack

Mcollective Common Problems

* The Number of Nodes reporting from

MCollective commands, or Live Management, varies

/var/log/pe-activemq/activemq.log javax.net.ssl.SSLHandshakeException:  Remote  host  

closed  connection  during  handshake    Solution: On the master, edit: /opt/puppet/share/puppet/modules/pe_mcollective/server.cfg.erb  

and  edit  the  line  registerinterval  =  

Page 18: Troubleshooting the Puppet Enterprise Stack

Live Management Common Problems And What They Look Like

* Nothing displays but a 500 error

Page 19: Troubleshooting the Puppet Enterprise Stack

Master/Agent

Logs: * /var/log/messages * /var/log/pe-httpd/error.log

Configuration: /etc/puppetlabs/puppet/puppet.conf

Page 20: Troubleshooting the Puppet Enterprise Stack

Master/Agent Common Problems And What They Look Like

* Nodes are failing runs /var/log/messages err: /File[/var/opt/lib/pe-puppet/lib]: Failed to generate

additional resources using 'eval_generate: Connection timed out - connect(2) err: Could not retrieve plugin: execution expired

Solution: Splay: http://docs.puppetlabs.com/references/latest/configuration.html#splay

Page 21: Troubleshooting the Puppet Enterprise Stack

Master/Agent Common Problems And What They Look Like

* Nodes are failing runs var/log/messages Error:  Could  not  request  certificate:  The  certificate  retrieved  

from  the  master  does  not  match  the  agent's  private  key.  

To  fix  this,  remove  the  certificate  from  both  the  master  and  the  agent  and  then  start  a  puppet  run,  which  will  automatically  regenerate  a  certficate.  

On  the  master:  

   puppet  cert  clean  agentname  

Restart  pe-­‐httpd  

On  the  agent:  

   rm  -­‐f  /etc/puppetlabs/puppet/ssl/certs/agentname  

   puppet  agent  -­‐t  

 

Page 22: Troubleshooting the Puppet Enterprise Stack

Master/Agent Common Problems And What They Look Like

* Nodes can’t reach the master Error:  Could  not  request  certificate:  getaddrinfo:  

Name  or  service  not  known  

Troubleshooting 1. telnet master 8140 2. Check /etc/hosts or DNS 3. ping master

Page 23: Troubleshooting the Puppet Enterprise Stack

Red Herrings /var/log/pe-httpd/error.log config.ru:9:  warning:  already  initialized  

constant  argv  

var/log/pe-httpd/puppetdashboard.error.log [warn]  RSA  server  certificate  CommonName  (CN)  

`pe-­‐internal-­‐dashboard'  does  NOT  match  server  name!?  

/var/log/pe-console-auth/auth.log INFO  2013-­‐08-­‐20  01:07  UTC:  User    (anonymous)  

accessed  read-­‐write  url  /reports/upload  

Page 24: Troubleshooting the Puppet Enterprise Stack

SSL Errors

Where your certs (mostly) live: /etc/puppetlabs/puppet/ssl /opt/puppet/share/puppet-dashboard/certs /etc/puppetlabs/puppetdb/ssl

Page 25: Troubleshooting the Puppet Enterprise Stack

Regenerating The CA And The Master

1. Delete the contents of /etc/puppetlabs/puppet/ssl directory on the master.

2. Run `puppet cert list` to regenerate the CA. 3. Stop pe-httpd. 4. Run `puppet master --no-daemonize --verbose` to regenerate the

master cert and create a cert request. 5. Check that ‘puppet cert list -a’ returned the master cert. 6. Restart pe-httpd.

Page 26: Troubleshooting the Puppet Enterprise Stack

Regenerating the PuppetDB Certs 1. Stop the PuppetDB service

2. Remove agent certs from/etc/puppetlabs/puppet/ssl/ if on a separate server and the PuppetDB ones from /etc/puppetlabs/puppetdb/ssl/

3. Run `puppet cert clean puppetdbhost.yourdomain` on the master (if not cleaned already and on a separate host)

4. Regenerate the Puppet Agent certs by performing a Puppet run on the PuppetDB, signing them on the master if necessary.

5. Run /opt/puppet/sbin/puppetdb-ssl-setup -f on thePuppetDB host.

6. Restart the PuppetDB service on its host, and the pe-httpd service on your master.

Page 27: Troubleshooting the Puppet Enterprise Stack

Regenerating The Console’s Certificate

1. cd /opt/puppet/share/puppet-dashboard/certs, and remove any existing contents. 2. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:create_key_pair 3. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:request 4. sudo puppet cert sign pe-internal-dashboard 5. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:retrieve 6. sudo chown -R puppet-dashboard:puppet-dashboard certs/ 7. /etc/init.d/pe-httpd restart

Page 28: Troubleshooting the Puppet Enterprise Stack

Regenerating The Agent’s Certificate

On the master: 1. puppet cert clean agenthostname 2. Restart pe-httpd

On the agent: 1.rm -rf /etc/puppetlabs/puppet/ssl 2. puppet agent -t

On the master: 1. puppet cert sign agenthostname

Page 29: Troubleshooting the Puppet Enterprise Stack

Regenerating Your Master’s Certificate

1. Edit your puppet.conf to update any changes to the hostname or alt names.

2. `puppet cert clean mastername` 3. Stop pe-httpd(/etc/init.d/pe-­‐httpd  stop).

4. Run `puppet master --no-daemonize --verbose’.

Page 30: Troubleshooting the Puppet Enterprise Stack

Certs that Puppet can Regenerate

pe-internal-broker pe-internal-mcollective-servers pe-internal-peadmin-mcollective-client pe-internal-puppet-console-mcollective-client

Page 31: Troubleshooting the Puppet Enterprise Stack

Regenerating All The Certificates

http://showterm.io/f41a4b7bb5b0b006d8a80

Page 32: Troubleshooting the Puppet Enterprise Stack

Q&A

Page 33: Troubleshooting the Puppet Enterprise Stack

Resources

Ask.Puppetlabs.com

Irc.freenode.net #puppet

PE-Users Mailing List: https://groups.google.com/a/puppetlabs.com/

group/pe-users/topics

Page 34: Troubleshooting the Puppet Enterprise Stack

Top Related