troubleshooting the puppet enterprise stack

Post on 10-May-2015

7.364 Views

Category:

Technology

7 Downloads

Preview:

Click to see full reader

DESCRIPTION

A guide through where to look for errors when they happen in the various parts of Puppet Enterprise ( the console, Live Management, puppet master, Activemq, MCollective, agent), what some of those errors mean, and what warnings and errors are red herrings/normally occurring. Celia Cottle Support Engineer, Puppet Labs Celia Cottle is a Support Engineer at Puppet Labs, where she troubleshoots and resolves issues for Puppet Enterprise customers. She comes from Portland State University, where she worked for the College of Engineering and Computer Science doing technical support, while getting her degree in Communication. She’s been working in IT for over five years and enjoys problem solving, working with a wide range of OSes and software, and the variety of challenges that supporting Puppet Enterprise brings. She currently resides in Portland, Oregon.

TRANSCRIPT

Troubleshooting Puppet Enterprise

Celia Cottle Support Engineer | Puppet Labs celia@puppetlabs.com @celiaPDX

The Stack Console The console is Puppet Enterprise’s web GUI.

Mcollective/Live Management LM is an interface to PE’s orchestration engine (Mcollective).

PuppetDB PuppetDB collects data generated by Puppet.

Master/Agent The central puppet server/ Retrieves the client configuration

from the puppet master and applies it to the local host

The Console

Console Logs /var/log/pe-httpd/puppetdashboard.error.log /var/log/pe-httpd/puppetdashboard.access.log /var/log/pe-httpd/puppetmaster.error.log

Configuration /etc/puppetlabs/puppet/puppet.conf

No nodes are reporting

Console Common Problems

•  Stop the pe-puppet-dashboard-workers

•  Check opt/puppet/share/puppet-dashboard/tmp/pids for files ending in .pid.

•  Restart the pe-puppet-dashboard-workers.

•  Run ps aux | grep delayed_job and see if entries like dashboard/delayed_job.1 and

delayed_job.1_monitor appear. If they are, that means the dashboard has started

up properly again.

Console Common Problems

There’s No Facts Listed For Nodes /Node Manager Won’t Display

/var/log/pe-httpd/puppetmaster.error.log [Fri  Aug  16  22:49:20  2013]  [error]  [client  172.16.0.2]  

Certificate  Verification:  Error  (23):  certificate  revoked  

Console Authentication Logs /var/log/pe-httpd/access.log /var/log/pe-httpd/error.log /var/log/pe-console-auth/

cas.log

Configuration Files /etc/puppetlabs/console-auth/cas_client_config.yml /etc/puppetlabs/rubycas-server/config.yml

Console Auth Common Problems

Can’t Log In /var/log/pe-console-auth/cas.log: Invalid credentials given for user 'console@puppetlabs.test' Possible Cause: Bad Credentials/Lost Credentials

$ cd /opt/puppet/share/console-auth $ sudo /opt/puppet/bin/rake db:create_user USERNAME="adminuser@example.com" PASSWORD="<password>" ROLE="Admin”

Alternatively, if using 3rd Party Auth: /var/log/pe-httpd/access.log

PuppetDB

PuppetDB

Log Files: /var/log/messages /var/log/pe-puppetdb/puppetdb.log

Config Files: /etc/puppetlabs/puppet/puppetdb.conf

PuppetDB Common Problems

SSL Errors * /var/log/messages Error:  Could  not  retrieve  catalog  from  remote  

server:  Error  400  on  SERVER:  Failed  to  submit  'replace  facts'  command  for  agent1.vm  to  PuppetDB  at  master0.vm:8081:  Server  hostname  'master0.vm'  did  not  match  server  certificate;  expected  one  of  master1.vm  

Puppetdb Common Problems

PuppetDB Won’t Start, Fails Silently /var/log/pe-puppetdb/puppetdb.log ***/var/log/pe-puppetdb/puppetdb-oom.hprof

java.lang.OutOfMemoryError:  Java  heap  space  Fix:

Edit the defaults in /etc/default/pe-puppetdb or /etc/sysconfig/pe-puppetdb, and change the 256m to 1024m

JAVA_ARGS="-Xmx256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pe-puppetdb/puppetdb-oom.hprof -Xms256m"

Live Management

Live Management /Mcollective

Logs: /var/log/pe-activemq/activemq.log /var/log/pe-mcollective/mcollective.log /var/log/pe-httpd/error.log

Configuration: /etc/puppetlabs/mcollective/server.cfg

Mcollective Common Problems

* None of the Nodes Show Up In Live Management /var/log/pe-httpd/error.log  No  MCollective  servers  responded.  Either  

MCollective  is  not  yet  configured  and  operational  or  all  MCollective  servers  are  off-­‐line.  Check  that  you  can  reach  your  servers  with  `mco  ping`.  It  may  also  help  to  increase  the  LM_DISCOVERY_TIMEOUT  or  LM_INVENTORY_RETRIES  variables  in  your  Apache  configuration.  

Live Management

Common Problems And What They Look Like * None of the Nodes Show Up In Live Management /var/log/pe-activemq/activemq.log |  WARN  |  Transport  Connection  to:  tcp://000.00.000.00:0000  

failed:  java.lang.SecurityException:  User  name  [mcollective]  or  password  is  invalid.  

Mcollective Common Problems

* The Number of Nodes reporting from

MCollective commands, or Live Management, varies

/var/log/pe-activemq/activemq.log javax.net.ssl.SSLHandshakeException:  Remote  host  

closed  connection  during  handshake    Solution: On the master, edit: /opt/puppet/share/puppet/modules/pe_mcollective/server.cfg.erb  

and  edit  the  line  registerinterval  =  

Live Management Common Problems And What They Look Like

* Nothing displays but a 500 error

Master/Agent

Logs: * /var/log/messages * /var/log/pe-httpd/error.log

Configuration: /etc/puppetlabs/puppet/puppet.conf

Master/Agent Common Problems And What They Look Like

* Nodes are failing runs /var/log/messages err: /File[/var/opt/lib/pe-puppet/lib]: Failed to generate

additional resources using 'eval_generate: Connection timed out - connect(2) err: Could not retrieve plugin: execution expired

Solution: Splay: http://docs.puppetlabs.com/references/latest/configuration.html#splay

Master/Agent Common Problems And What They Look Like

* Nodes are failing runs var/log/messages Error:  Could  not  request  certificate:  The  certificate  retrieved  

from  the  master  does  not  match  the  agent's  private  key.  

To  fix  this,  remove  the  certificate  from  both  the  master  and  the  agent  and  then  start  a  puppet  run,  which  will  automatically  regenerate  a  certficate.  

On  the  master:  

   puppet  cert  clean  agentname  

Restart  pe-­‐httpd  

On  the  agent:  

   rm  -­‐f  /etc/puppetlabs/puppet/ssl/certs/agentname  

   puppet  agent  -­‐t  

 

Master/Agent Common Problems And What They Look Like

* Nodes can’t reach the master Error:  Could  not  request  certificate:  getaddrinfo:  

Name  or  service  not  known  

Troubleshooting 1. telnet master 8140 2. Check /etc/hosts or DNS 3. ping master

Red Herrings /var/log/pe-httpd/error.log config.ru:9:  warning:  already  initialized  

constant  argv  

var/log/pe-httpd/puppetdashboard.error.log [warn]  RSA  server  certificate  CommonName  (CN)  

`pe-­‐internal-­‐dashboard'  does  NOT  match  server  name!?  

/var/log/pe-console-auth/auth.log INFO  2013-­‐08-­‐20  01:07  UTC:  User    (anonymous)  

accessed  read-­‐write  url  /reports/upload  

SSL Errors

Where your certs (mostly) live: /etc/puppetlabs/puppet/ssl /opt/puppet/share/puppet-dashboard/certs /etc/puppetlabs/puppetdb/ssl

Regenerating The CA And The Master

1. Delete the contents of /etc/puppetlabs/puppet/ssl directory on the master.

2. Run `puppet cert list` to regenerate the CA. 3. Stop pe-httpd. 4. Run `puppet master --no-daemonize --verbose` to regenerate the

master cert and create a cert request. 5. Check that ‘puppet cert list -a’ returned the master cert. 6. Restart pe-httpd.

Regenerating the PuppetDB Certs 1. Stop the PuppetDB service

2. Remove agent certs from/etc/puppetlabs/puppet/ssl/ if on a separate server and the PuppetDB ones from /etc/puppetlabs/puppetdb/ssl/

3. Run `puppet cert clean puppetdbhost.yourdomain` on the master (if not cleaned already and on a separate host)

4. Regenerate the Puppet Agent certs by performing a Puppet run on the PuppetDB, signing them on the master if necessary.

5. Run /opt/puppet/sbin/puppetdb-ssl-setup -f on thePuppetDB host.

6. Restart the PuppetDB service on its host, and the pe-httpd service on your master.

Regenerating The Console’s Certificate

1. cd /opt/puppet/share/puppet-dashboard/certs, and remove any existing contents. 2. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:create_key_pair 3. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:request 4. sudo puppet cert sign pe-internal-dashboard 5. sudo /opt/puppet/bin/rake RAILS_ENV=production cert:retrieve 6. sudo chown -R puppet-dashboard:puppet-dashboard certs/ 7. /etc/init.d/pe-httpd restart

Regenerating The Agent’s Certificate

On the master: 1. puppet cert clean agenthostname 2. Restart pe-httpd

On the agent: 1.rm -rf /etc/puppetlabs/puppet/ssl 2. puppet agent -t

On the master: 1. puppet cert sign agenthostname

Regenerating Your Master’s Certificate

1. Edit your puppet.conf to update any changes to the hostname or alt names.

2. `puppet cert clean mastername` 3. Stop pe-httpd(/etc/init.d/pe-­‐httpd  stop).

4. Run `puppet master --no-daemonize --verbose’.

Certs that Puppet can Regenerate

pe-internal-broker pe-internal-mcollective-servers pe-internal-peadmin-mcollective-client pe-internal-puppet-console-mcollective-client

Regenerating All The Certificates

http://showterm.io/f41a4b7bb5b0b006d8a80

Q&A

Resources

Ask.Puppetlabs.com

Irc.freenode.net #puppet

PE-Users Mailing List: https://groups.google.com/a/puppetlabs.com/

group/pe-users/topics

top related