making your logs work for you: drupal escalation and disaster recovery

38

Upload: pantheon

Post on 06-Aug-2015

103 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Making your logs work for you: Drupal escalation and disaster recovery
Page 2: Making your logs work for you: Drupal escalation and disaster recovery

When sh*t hits the fan, what do you do?

How to make your logs work for you in times of website sadness

DrupalCon LA 2015

Page 3: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Most common causes of downtime

30%Weather/

Environment

33%IT/

Equipment

34%Cyber Attack

Page 4: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

The top cause of unplanned outages?

48%Human Error

Page 5: Making your logs work for you: Drupal escalation and disaster recovery

52% of those surveyed believe ALL or most of unplanned outages can be avoided.

Page 6: Making your logs work for you: Drupal escalation and disaster recovery

What does downtime cost you and your business?

Page 7: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

How does downtime affect your bottom line?

37%

22%

15%

10%

9%

7%

Cost associated with reputation and brand damage

Revenues lost because of system availability problems

Loss of user productivity and increased frustration

Cost associated with compliance or regulatory failure

Cost of forensics to determine the root causes of disruptions

Cost of technical support to restore systems to an operational state

Page 8: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Meet the Speaker

Timani Tunduwani

Customer Support Manager

PANTHEON RUNS 100,000 WEBSITES

WE DO THIS ALL THE TIME

Page 9: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

1. Overview2. Logging in Drupal & PHP3. Incident Planning & Management4. Live Demo5. Questions

Agenda

Page 10: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Website is down. Why?How do we get it back up?

Is the infrastructure down?

Is Drupal Sad and Broken again??

What’s going on?

Do I fix it or Pantheon?

WTF?! FIX IT?

What happens when your website is down?

EVERY SECOND YOU DON’T KNOW

IS ANOTHER SECOND YOUR

WEBSITE IS DOWN

Page 11: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

What would you do if your website went down… right now?

Website Owner

Who do I contact?

YOU NEED A PLAN! SERIOUSLY!

Website Developer Project Manager Drupal Support & Maintenance Team

Page 12: Making your logs work for you: Drupal escalation and disaster recovery

Application log management

<?php watchdog($type, $message, $severity = WATCHDOG_NOTICE, $link = NULL); ?>

Page 13: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

1. Standardize2. Centralize3. Aggregate4. Analyze5. Alert

A 5 step plan for success

Page 14: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

1. Semi-arbitrary log format2. Drupal 8 using PSR33. Can not have saved searches beyond sticky search4. No reporting dashboard for post mortem5. No stack traces6. Not portable. Have you tried to export the watchdog table?

Current limitations of watchdog

Page 15: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

MariaDB [pantheon]> select wid, type, message, variables from watchdog limit 100 \G

*************************** 1. row ***************************

wid: 1830682

type: php

message: %type: !message in %function (line %line of %file).

variables: a:6:{s:5:"%type";s:6:"Notice";s:8:"!message";s:26:"Undefined index:

authorize";s:9:"%function";s:40:"FeedsEntityProcessor->entitySaveAccess()";s:5:"%file";s:

108:"

/srv/bindings/aa7491e7ef954a8fb4f9dc41abccab80/code/sites/all/modules/feeds/plugins/Feeds

EntityProcessor.inc";s:5:"%line";i:77;s:14:"severity_level";i:5;}

Drupal watchdog table

Page 16: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

1. PHP Framework Interop Group (PHP Fig)

2. Monolog2.1. Chain of responsibility

logging pattern2.2. Core concepts

Overview

Page 17: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Proposing a Standards Recommendation (PSR)

❏ PSR 0: added-spl-autoload-register❏ PSR-1: Basic-coding-standard❏ PSR-2: Coding-style-guide-meta❏ PSR-3: Logger-interface❏ PSR-4: Autoloader-examples

Page 18: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

PSR-3 : A common interface for logging libraries

The goal is to allow libraries to receive a Psr\Log\LoggerInterface object and write logs to it in a simple and universal way.

Page 19: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Logging Levels - RFC 5424

Error Level Code Description

DEBUG 100 Detailed debug information.

INFO 200 Interesting events. Examples: User logs in, SQL logs.

NOTICE 250 Normal but significant events.

WARNING 300 Exceptional occurrences that are not errors.

Error 400 Runtime errors that do not require immediate action.

Critical 500 Critical conditions.

Alert 550 Action must be taken immediately.

Emergency 600 Emergency: system is unusable.

Page 20: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Monolog + Composer + Drupal

Monolog sends your logs to files, sockets, inboxes, databases and various web services

Page 21: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Chain of responsibility pattern

Page 22: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Core Concepts

1. Logger2. Handler3. Log Levels4. Formatter5. Processor6. Utilities

Page 23: Making your logs work for you: Drupal escalation and disaster recovery

Log system overview

Page 24: Making your logs work for you: Drupal escalation and disaster recovery

Application Performance Monitoring

Page 25: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Centralizing application logs

Monolog

Page 26: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

• On-Call Scheduling• Auto-Escalation• International Reach• Collaboration• Advanced Analytics

Features

• Reliability• Monitoring Aggregation• Easy Setup• Effective Alerting• Full stack visibility

Page 27: Making your logs work for you: Drupal escalation and disaster recovery

Cloud-based centralized log management

Page 28: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Centralizing application logs

Monolog

Page 29: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

• Built-in alerting• Customized dashboards• Persistent workspaces• Multiple integrations available• Advanced Analytics• Overage protection

• Agentless log collection• Centralized logging• Supports multiple log formats• Automated event parsing• Powerful search capabilities• Unlimited saved searches

Features

Page 30: Making your logs work for you: Drupal escalation and disaster recovery

Incident Management

Page 31: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Incident Response Goals

1. Verify that an incident occurred.2. Maintain or Restore Business Continuity.3. Reduce the incident impact.4. Determine how the attack was done or the incident happened.5. Prevent future attacks or incidents.6. Improve security and incident response.7. Prosecute illegal activity.8. Keep management informed of the situation & response

Page 32: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

Incident planning

Step 1Form a Collaborative Planning Team

Step 2Understand the Situation

Step 3Determine Goals and Objectives

Step 4Plan Development

Step 5Plan Prep, Review & Approval

Step 6Plan Implementation& Maintenance

Page 33: Making your logs work for you: Drupal escalation and disaster recovery

Incident managment system

Page 34: Making your logs work for you: Drupal escalation and disaster recovery

IT incidents management platform

Page 35: Making your logs work for you: Drupal escalation and disaster recovery

Pantheon.io

1. Reliability2. Monitoring Aggregation3. Easy Setup4. Effective Alerting5. Mobile Incident Management6. Escalation Policies

1. On-Call Scheduling2. Auto-Escalation3. International Reach4. Collaboration5. Advanced Analytics

Features

Page 36: Making your logs work for you: Drupal escalation and disaster recovery

Slack HQ communication platform

Page 37: Making your logs work for you: Drupal escalation and disaster recovery

Live Demo #1Time to break something. YAY!

Page 38: Making your logs work for you: Drupal escalation and disaster recovery

Fin!