how to do monitoring that won't make your engineers quit

62
Relaxing picture of Yoga

Upload: gil-zellner

Post on 26-Jan-2017

429 views

Category:

Software


2 download

TRANSCRIPT

Page 1: How to do monitoring that won't make your engineers quit

Relaxing picture of Yoga

Page 3: How to do monitoring that won't make your engineers quit
Page 4: How to do monitoring that won't make your engineers quit

hunt through logs for 2 hours

Page 5: How to do monitoring that won't make your engineers quit
Page 6: How to do monitoring that won't make your engineers quit
Page 7: How to do monitoring that won't make your engineers quit
Page 8: How to do monitoring that won't make your engineers quit

Monitoring that will make your engineers give up

Gil Zellner (CloudifyDev at Gigaspaces)

Twitter: @Heathenaspargus

Page 12: How to do monitoring that won't make your engineers quit

cost of hiring new employee is 1.5-3x their monthly salary

@Heathenaspargus

Page 14: How to do monitoring that won't make your engineers quit

Easy (days) Intermediate (months)

Hard (years)

- no changes to infrastructure

- just policy

- Small changes to apps

- logging- light

automation

- Design for better operability

- long term

@Heathenaspargus

Page 19: How to do monitoring that won't make your engineers quit

frustration - I am unable to complete my task

@Heathenaspargus

Page 20: How to do monitoring that won't make your engineers quit

Time spent inefficiently

@Heathenaspargus

Page 24: How to do monitoring that won't make your engineers quit

https://www.ergoflex.co.uk/blog/category/sleep-research/sleeponomics-could-sleep-deprivation-be-the-real-reason-politicians-make-bad-decisions

@Heathenaspargus

Page 25: How to do monitoring that won't make your engineers quit

Mandatory Half day-off after night production issue

@Heathenaspargus

Page 26: How to do monitoring that won't make your engineers quit

Allocate weekly time to resolve or automate issues that kept us up at night

@Heathenaspargus

Page 27: How to do monitoring that won't make your engineers quit

Wider rotation (more people do on-call)

@Heathenaspargus

Page 28: How to do monitoring that won't make your engineers quit

https://www.youtube.com/watch?v=IUoEiDT1nXY

Creating a DevOps Culture: Identifying a “Single Person of Failure”

@Heathenaspargus

Page 29: How to do monitoring that won't make your engineers quit

Knowledge Matrix

Deploy System Mobile Link Backend

Gil V V

Karen V V

Ari V V

@Heathenaspargus

Page 32: How to do monitoring that won't make your engineers quit

Easy (days) Intermediate (months)

Hard (years)

- no changes to infrastructure

- just policy

- Small changes to apps

- logging- light

automation

- Design for better operability

- long term

@Heathenaspargus

Page 33: How to do monitoring that won't make your engineers quit
Page 34: How to do monitoring that won't make your engineers quit
Page 36: How to do monitoring that won't make your engineers quit

solution: alert only things that meet the following criteria:

1) Alert on symptoms, not suspected "causes"2) Actionable3) Business breaking

@Heathenaspargus

Page 38: How to do monitoring that won't make your engineers quit

Solution: direct alerts to relevant parties

@Heathenaspargus

Page 39: How to do monitoring that won't make your engineers quit

Companies that are doing this as a service:

@Heathenaspargus

Page 44: How to do monitoring that won't make your engineers quit

Companies that are doing this as a service:

@Heathenaspargus

Page 45: How to do monitoring that won't make your engineers quit

Picking the right things to measure

Page 47: How to do monitoring that won't make your engineers quit

Netflix stream starts per second

@Heathenaspargus

Page 48: How to do monitoring that won't make your engineers quit

What are your KPIs ?stream starts per second

Taxi orders per minute

Api calls per second

@Heathenaspargus

Page 49: How to do monitoring that won't make your engineers quit

Companies that are doing this as a service:

@Heathenaspargus

Page 53: How to do monitoring that won't make your engineers quit

Auto-remediation basics1) Make remediation script2) Make diagnosis script3) Connect them

@Heathenaspargus

Page 56: How to do monitoring that won't make your engineers quit

Heal Workflows - Cloudify

@Heathenaspargus

Page 57: How to do monitoring that won't make your engineers quit

Easy (days) Intermediate (months)

Hard (years)

- no changes to infrastructure

- just policy

- Small changes to apps

- logging- light

automation

- Design for better operability

- long term

@Heathenaspargus

Page 58: How to do monitoring that won't make your engineers quit

Incentive for resilient architecture

0.99 uptime: 87.6 hours per year

0.999 uptime: 8.76 hours per year

0.9999 uptime: 52.6 minutes per year

0.99999 uptime: 5.3 minutes per year

@Heathenaspargus