troubleshooting the puppet enterprise stack

34
Troubleshooting Puppet Enterprise Celia Cottle Support Engineer | Puppet Labs [email protected] @celiaPDX

Upload: puppet-labs

Post on 10-May-2015

7.364 views

Category:

Technology


7 download

DESCRIPTION

A guide through where to look for errors when they happen in the various parts of Puppet Enterprise ( the console, Live Management, puppet master, Activemq, MCollective, agent), what some of those errors mean, and what warnings and errors are red herrings/normally occurring. Celia Cottle Support Engineer, Puppet Labs Celia Cottle is a Support Engineer at Puppet Labs, where she troubleshoots and resolves issues for Puppet Enterprise customers. She comes from Portland State University, where she worked for the College of Engineering and Computer Science doing technical support, while getting her degree in Communication. She’s been working in IT for over five years and enjoys problem solving, working with a wide range of OSes and software, and the variety of challenges that supporting Puppet Enterprise brings. She currently resides in Portland, Oregon.

TRANSCRIPT

Page 1: Troubleshooting the Puppet Enterprise Stack

Troubleshooting Puppet Enterprise

Celia Cottle Support Engineer | Puppet Labs [email protected] @celiaPDX

Page 2: Troubleshooting the Puppet Enterprise Stack

The Stack Console The console is Puppet Enterprise’s web GUI.

Mcollective/Live Management LM is an interface to PE’s orchestration engine (Mcollective).

PuppetDB PuppetDB collects data generated by Puppet.

Master/Agent The central puppet server/ Retrieves the client configuration

from the puppet master and applies it to the local host

Page 3: Troubleshooting the Puppet Enterprise Stack

The Console

Page 4: Troubleshooting the Puppet Enterprise Stack

Console Logs /var/log/pe-httpd/puppetdashboard.error.log /var/log/pe-httpd/puppetdashboard.access.log /var/log/pe-httpd/puppetmaster.error.log

Configuration /etc/puppetlabs/puppet/puppet.conf

Page 5: Troubleshooting the Puppet Enterprise Stack

No nodes are reporting

Console Common Problems

•  Stop the pe-puppet-dashboard-workers

•  Check opt/puppet/share/puppet-dashboard/tmp/pids for files ending in .pid.

•  Restart the pe-puppet-dashboard-workers.

•  Run ps aux | grep delayed_job and see if entries like dashboard/delayed_job.1 and

delayed_job.1_monitor appear. If they are, that means the dashboard has started

up properly again.

Page 6: Troubleshooting the Puppet Enterprise Stack

Console Common Problems

There’s No Facts Listed For Nodes /Node Manager Won’t Display

/var/log/pe-httpd/puppetmaster.error.log [Fri  Aug  16  22:49:20  2013]  [error]  [client  172.16.0.2]  

Certificate  Verification:  Error  (23):  certificate  revoked  

Page 7: Troubleshooting the Puppet Enterprise Stack

Console Authentication Logs /var/log/pe-httpd/access.log /var/log/pe-httpd/error.log /var/log/pe-console-auth/

cas.log

Configuration Files /etc/puppetlabs/console-auth/cas_client_config.yml /etc/puppetlabs/rubycas-server/config.yml

Page 8: Troubleshooting the Puppet Enterprise Stack

Console Auth Common Problems

Can’t Log In /var/log/pe-console-auth/cas.log: Invalid credentials given for user '[email protected]' Possible Cause: Bad Credentials/Lost Credentials

$ cd /opt/puppet/share/console-auth $ sudo /opt/puppet/bin/rake db:create_user USERNAME="[email protected]" PASSWORD="<password>" ROLE="Admin”

Alternatively, if using 3rd Party Auth: /var/log/pe-httpd/access.log

Page 9: Troubleshooting the Puppet Enterprise Stack

PuppetDB

Page 10: Troubleshooting the Puppet Enterprise Stack

PuppetDB

Log Files: /var/log/messages /var/log/pe-puppetdb/puppetdb.log

Config Files: /etc/puppetlabs/puppet/puppetdb.conf

Page 11: Troubleshooting the Puppet Enterprise Stack

PuppetDB Common Problems

SSL Errors * /var/log/messages Error:  Could  not  retrieve  catalog  from  remote  

server:  Error  400  on  SERVER:  Failed  to  submit  'replace  facts'  command  for  agent1.vm  to  PuppetDB  at  master0.vm:8081:  Server  hostname  'master0.vm'  did  not  match  server  certificate;  expected  one  of  master1.vm  

Page 12: Troubleshooting the Puppet Enterprise Stack

Puppetdb Common Problems

PuppetDB Won’t Start, Fails Silently /var/log/pe-puppetdb/puppetdb.log ***/var/log/pe-puppetdb/puppetdb-oom.hprof

java.lang.OutOfMemoryError:  Java  heap  space  Fix:

Edit the defaults in /etc/default/pe-puppetdb or /etc/sysconfig/pe-puppetdb, and change the 256m to 1024m

JAVA_ARGS="-Xmx256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pe-puppetdb/puppetdb-oom.hprof -Xms256m"

Page 13: Troubleshooting the Puppet Enterprise Stack

Live Management

Page 14: Troubleshooting the Puppet Enterprise Stack

Live Management /Mcollective

Logs: /var/log/pe-activemq/activemq.log /var/log/pe-mcollective/mcollective.log /var/log/pe-httpd/error.log

Configuration: /etc/puppetlabs/mcollective/server.cfg

Page 15: Troubleshooting the Puppet Enterprise Stack

Mcollective Common Problems

* None of the Nodes Show Up In Live Management /var/log/pe-httpd/error.log  No  MCollective  servers  responded.  Either  

MCollective  is  not  yet  configured  and  operational  or  all  MCollective  servers  are  off-­‐line.  Check  that  you  can  reach  your  servers  with  `mco  ping`.  It  may  also  help  to  increase  the  LM_DISCOVERY_TIMEOUT  or  LM_INVENTORY_RETRIES  variables  in  your  Apache  configuration.  

Page 16: Troubleshooting the Puppet Enterprise Stack

Live Management

Common Problems And What They Look Like * None of the Nodes Show Up In Live Management /var/log/pe-activemq/activemq.log |  WARN  |  Transport  Connection  to:  tcp://000.00.000.00:0000  

failed:  java.lang.SecurityException:  User  name  [mcollective]  or  password  is  invalid.  

Page 17: Troubleshooting the Puppet Enterprise Stack

Mcollective Common Problems

* The Number of Nodes reporting from

MCollective commands, or Live Management, varies

/var/log/pe-activemq/activemq.log javax.net.ssl.SSLHandshakeException:  Remote  host  

closed  connection  during  handshake    Solution: On the master, edit: /opt/puppet/share/puppet/modules/pe_mcollective/server.cfg.erb  

and  edit  the  line  registerinterval  =  

Page 18: Troubleshooting the Puppet Enterprise Stack

Live Management Common Problems And What They Look Like

* Nothing displays but a 500 error

Page 19: Troubleshooting the Puppet Enterprise Stack

Master/Agent

Logs: * /var/log/messages * /var/log/pe-httpd/error.log

Configuration: /etc/puppetlabs/puppet/puppet.conf

Page 20: Troubleshooting the Puppet Enterprise Stack

Master/Agent Common Problems And What They Look Like

* Nodes are failing runs /var/log/messages err: /File[/var/opt/lib/pe-puppet/lib]: Failed to generate

additional resources using 'eval_generate: Connection timed out - connect(2) err: Could not retrieve plugin: execution expired

Solution: Splay: http://docs.puppetlabs.com/references/latest/configuration.html#splay

Page 21: Troubleshooting the Puppet Enterprise Stack

Master/Agent Common Problems And What They Look Like

* Nodes are failing runs var/log/messages Error:  Could  not  request  certificate:  The  certificate  retrieved  

from  the  master  does  not  match  the  agent's  private  key.  

To  fix  this,  remove  the  certificate  from  both  the  master  and  the  agent  and  then  start  a  puppet  run,  which  will  automatically  regenerate  a  certficate.  

On  the  master:  

   puppet  cert  clean  agentname  

Restart  pe-­‐httpd  

On  the  agent:  

   rm  -­‐f  /etc/puppetlabs/puppet/ssl/certs/agentname  

   puppet  agent  -­‐t  

 

Page 22: Troubleshooting the Puppet Enterprise Stack

Master/Agent Common Problems And What They Look Like

* Nodes can’t reach the master Error:  Could  not  request  certificate:  getaddrinfo:  

Name  or  service  not  known  

Troubleshooting 1. telnet master 8140 2. Check /etc/hosts or DNS 3. ping master

Page 23: Troubleshooting the Puppet Enterprise Stack

Red Herrings /var/log/pe-httpd/error.log config.ru:9:  warning:  already  initialized  

constant  argv  

var/log/pe-httpd/puppetdashboard.error.log [warn]  RSA  server  certificate  CommonName  (CN)  

`pe-­‐internal-­‐dashboard'  does  NOT  match  server  name!?  

/var/log/pe-console-auth/auth.log INFO  2013-­‐08-­‐20  01:07  UTC:  User    (anonymous)  

accessed  read-­‐write  url  /reports/upload  

Page 24: Troubleshooting the Puppet Enterprise Stack

SSL Errors

Where your certs (mostly) live: /etc/puppetlabs/puppet/ssl /opt/puppet/share/puppet-dashboard/certs /etc/puppetlabs/puppetdb/ssl

Page 25: Troubleshooting the Puppet Enterprise Stack

Regenerating The CA And The Master

1. Delete the contents of /etc/puppetlabs/puppet/ssl directory on the master.

2. Run `puppet cert list` to regenerate the CA. 3. Stop pe-httpd. 4. Run `puppet master --no-daemonize --verbose` to regenerate the

master cert and create a cert request. 5. Check that ‘puppet cert list -a’ returned the master cert. 6. Restart pe-httpd.

Page 26: Troubleshooting the Puppet Enterprise Stack

Regenerating the PuppetDB Certs 1. Stop the PuppetDB service

2. Remove agent certs from/etc/puppetlabs/puppet/ssl/ if on a separate server and the PuppetDB ones from /etc/puppetlabs/puppetdb/ssl/

3. Run `puppet cert clean puppetdbhost.yourdomain` on the master (if not cleaned already and on a separate host)

4. Regenerate the Puppet Agent certs by performing a Puppet run on the PuppetDB, signing them on the master if necessary.

5. Run /opt/puppet/sbin/puppetdb-ssl-setup -f on thePuppetDB host.

6. Restart the PuppetDB service on its host, and the pe-httpd service on your master.

Page 27: Troubleshooting the Puppet Enterprise Stack

Regenerating The Console’s Certificate

1. cd /opt/puppet/share/puppet-dashboard/certs, and remove any existing contents. 2. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:create_key_pair 3. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:request 4. sudo puppet cert sign pe-internal-dashboard 5. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:retrieve 6. sudo chown -R puppet-dashboard:puppet-dashboard certs/ 7. /etc/init.d/pe-httpd restart

Page 28: Troubleshooting the Puppet Enterprise Stack

Regenerating The Agent’s Certificate

On the master: 1. puppet cert clean agenthostname 2. Restart pe-httpd

On the agent: 1.rm -rf /etc/puppetlabs/puppet/ssl 2. puppet agent -t

On the master: 1. puppet cert sign agenthostname

Page 29: Troubleshooting the Puppet Enterprise Stack

Regenerating Your Master’s Certificate

1. Edit your puppet.conf to update any changes to the hostname or alt names.

2. `puppet cert clean mastername` 3. Stop pe-httpd(/etc/init.d/pe-­‐httpd  stop).

4. Run `puppet master --no-daemonize --verbose’.

Page 30: Troubleshooting the Puppet Enterprise Stack

Certs that Puppet can Regenerate

pe-internal-broker pe-internal-mcollective-servers pe-internal-peadmin-mcollective-client pe-internal-puppet-console-mcollective-client

Page 31: Troubleshooting the Puppet Enterprise Stack

Regenerating All The Certificates

http://showterm.io/f41a4b7bb5b0b006d8a80

Page 32: Troubleshooting the Puppet Enterprise Stack

Q&A

Page 33: Troubleshooting the Puppet Enterprise Stack

Resources

Ask.Puppetlabs.com

Irc.freenode.net #puppet

PE-Users Mailing List: https://groups.google.com/a/puppetlabs.com/

group/pe-users/topics

Page 34: Troubleshooting the Puppet Enterprise Stack