monitoring complex systems - chicago erlang, 2014

97
Monitoring Complex Systems

Upload: brian-troutwine

Post on 27-Nov-2014

796 views

Category:

Software


1 download

DESCRIPTION

Imagine being responsible for monitoring 100 servers. Now imagine 1000. Each server has 100 different things to keep track of. What do you pay attention to and what do you ignore? What is important? In this talk Brian will show how Erlang can be used to capture more information without compromising clarity — i.e. to keep track of the forest without loosing site of the trees!

TRANSCRIPT

Page 1: Monitoring Complex Systems - Chicago Erlang, 2014

Monitoring Complex Systems

Page 2: Monitoring Complex Systems - Chicago Erlang, 2014

I do things to/with computers.

Page 3: Monitoring Complex Systems - Chicago Erlang, 2014

I build real-time systems.

Page 4: Monitoring Complex Systems - Chicago Erlang, 2014

I build fault-tolerant systems.

Page 5: Monitoring Complex Systems - Chicago Erlang, 2014

I build critical systems.

Page 6: Monitoring Complex Systems - Chicago Erlang, 2014

AdRoll

Page 7: Monitoring Complex Systems - Chicago Erlang, 2014

Less this.

Page 8: Monitoring Complex Systems - Chicago Erlang, 2014

More this.

Page 9: Monitoring Complex Systems - Chicago Erlang, 2014

Engineering + Mathematics = ads

Page 10: Monitoring Complex Systems - Chicago Erlang, 2014

Engineering + Mathematics = ads

(you’re welcome)

Page 11: Monitoring Complex Systems - Chicago Erlang, 2014

R E A L - T I M E B I D D I N G

Page 12: Monitoring Complex Systems - Chicago Erlang, 2014

The Problem Domain• Low latency ( < 100 ms per transaction)

Page 13: Monitoring Complex Systems - Chicago Erlang, 2014

The Problem Domain• Low latency ( < 100 ms per transaction)

• Firm real-time system

Page 14: Monitoring Complex Systems - Chicago Erlang, 2014

The Problem Domain• Low latency ( < 100 ms per transaction)

• Firm real-time system

• Highly concurrent (90 billion transactions per day)

Page 15: Monitoring Complex Systems - Chicago Erlang, 2014

The Problem Domain• Low latency ( < 100 ms per transaction)

• Firm real-time system

• Highly concurrent (90 billion transactions per day)

• Global, 24/7 operation

Page 16: Monitoring Complex Systems - Chicago Erlang, 2014

I build Complex Systems

Page 17: Monitoring Complex Systems - Chicago Erlang, 2014

Complex Systems

• Non-linear feedback

• Tightly coupled to external systems

• Difficult to model, understand

Page 18: Monitoring Complex Systems - Chicago Erlang, 2014

Bad things happen when Complex Systems fail.

Page 19: Monitoring Complex Systems - Chicago Erlang, 2014

Humans are bad at predicting the performance of complex systems(…). Our ability to create large and complex systems fools us into believing that we’re also entitled to understand them.

C A R L O S B U E N O “ M AT U R E O P T I M I Z AT I O N H A N D B O O K ”

Page 20: Monitoring Complex Systems - Chicago Erlang, 2014

Complex Systems often create worse problems than those they solve.

Page 21: Monitoring Complex Systems - Chicago Erlang, 2014

The key challenge to sustaining a complex

system is maintaining our understanding of it.

Page 22: Monitoring Complex Systems - Chicago Erlang, 2014

What can be done?

Page 23: Monitoring Complex Systems - Chicago Erlang, 2014

Ahead of time verification is not sufficient.

!

(don’t scrimp on it, though)

Page 24: Monitoring Complex Systems - Chicago Erlang, 2014

Compile-time guarantees are not sufficient.

!

(don’t scrimp on them, either)

Page 25: Monitoring Complex Systems - Chicago Erlang, 2014

We need insight into the running system.

Page 26: Monitoring Complex Systems - Chicago Erlang, 2014

• VM killers

What are we looking for?

Page 27: Monitoring Complex Systems - Chicago Erlang, 2014

• VM killers

• Application performance regressions

What are we looking for?

Page 28: Monitoring Complex Systems - Chicago Erlang, 2014

• VM killers

• Application performance regressions

• Abnormal application behavior

What are we looking for?

Page 29: Monitoring Complex Systems - Chicago Erlang, 2014

• VM killers

• Application performance regressions

• Abnormal application behavior

• Surprises

What are we looking for?

Page 30: Monitoring Complex Systems - Chicago Erlang, 2014

INSTRUMENTATION

Page 31: Monitoring Complex Systems - Chicago Erlang, 2014

The BEAM is ready to ride.

Page 32: Monitoring Complex Systems - Chicago Erlang, 2014

erlang:memory/1

Page 33: Monitoring Complex Systems - Chicago Erlang, 2014

erlang:statistics/1

Page 34: Monitoring Complex Systems - Chicago Erlang, 2014

erlang:system_info/1

Page 35: Monitoring Complex Systems - Chicago Erlang, 2014

What about our own work?

Page 36: Monitoring Complex Systems - Chicago Erlang, 2014

Exometer

Page 37: Monitoring Complex Systems - Chicago Erlang, 2014

Important Termsmetric a measurement

entry a receiver and aggregator of metrics

reporter that which samples entries periodically and ships them to another system

subscription the definition of the regular interval on which reporters sample entries

Page 38: Monitoring Complex Systems - Chicago Erlang, 2014

These are all loosely coupled at runtime.

Page 39: Monitoring Complex Systems - Chicago Erlang, 2014

Configuration is static, but you can adapt it on the fly.

Page 40: Monitoring Complex Systems - Chicago Erlang, 2014

Why exometer over the alternatives?

Page 41: Monitoring Complex Systems - Chicago Erlang, 2014

It is extensively documented.

Page 42: Monitoring Complex Systems - Chicago Erlang, 2014

It’s vigorously maintained.

Page 43: Monitoring Complex Systems - Chicago Erlang, 2014

It’s vigorously maintained.

(Ulf Wiger fan club day.)

Page 44: Monitoring Complex Systems - Chicago Erlang, 2014

It is silly fast.

Page 45: Monitoring Complex Systems - Chicago Erlang, 2014

Okay, great. We have instrumentation.

Page 46: Monitoring Complex Systems - Chicago Erlang, 2014

Now what?

Page 47: Monitoring Complex Systems - Chicago Erlang, 2014

MONITORING

Page 48: Monitoring Complex Systems - Chicago Erlang, 2014

This is the hard part.

Page 49: Monitoring Complex Systems - Chicago Erlang, 2014

Visualization

Page 50: Monitoring Complex Systems - Chicago Erlang, 2014

Alerting

Page 51: Monitoring Complex Systems - Chicago Erlang, 2014

Analysis

Page 52: Monitoring Complex Systems - Chicago Erlang, 2014

Visualization tells you how things look but not why.

Page 53: Monitoring Complex Systems - Chicago Erlang, 2014

A good day.

Page 54: Monitoring Complex Systems - Chicago Erlang, 2014

Uh oh.

Page 55: Monitoring Complex Systems - Chicago Erlang, 2014

Growth is good, but steady is better.

Page 56: Monitoring Complex Systems - Chicago Erlang, 2014

Bids are

stable.

Page 57: Monitoring Complex Systems - Chicago Erlang, 2014

Budgets are

stable.

Page 58: Monitoring Complex Systems - Chicago Erlang, 2014

What happened?

Page 59: Monitoring Complex Systems - Chicago Erlang, 2014

We forgot about Labor Day.

Page 60: Monitoring Complex Systems - Chicago Erlang, 2014
Page 61: Monitoring Complex Systems - Chicago Erlang, 2014

Alerting tells you that something happened, but not why.

Page 62: Monitoring Complex Systems - Chicago Erlang, 2014

A normal day.

Page 63: Monitoring Complex Systems - Chicago Erlang, 2014

Wat?

Page 64: Monitoring Complex Systems - Chicago Erlang, 2014

That’s some cliff.

Page 65: Monitoring Complex Systems - Chicago Erlang, 2014

Timeouts look good.

Page 66: Monitoring Complex Systems - Chicago Erlang, 2014

Errorsprior are okay.

Page 67: Monitoring Complex Systems - Chicago Erlang, 2014

What happened?

Page 68: Monitoring Complex Systems - Chicago Erlang, 2014

“Uh, hey guys, you know Facebook is down, right?”

Page 69: Monitoring Complex Systems - Chicago Erlang, 2014

Analysis gives you why but only if you know how to ask for what.

Page 70: Monitoring Complex Systems - Chicago Erlang, 2014

The memory use of a bidder.

Page 71: Monitoring Complex Systems - Chicago Erlang, 2014

ಠ_ಠ

Page 72: Monitoring Complex Systems - Chicago Erlang, 2014

It’s all binaries.

Page 73: Monitoring Complex Systems - Chicago Erlang, 2014

Not in processes.

Page 74: Monitoring Complex Systems - Chicago Erlang, 2014

Not in ETS.

Page 75: Monitoring Complex Systems - Chicago Erlang, 2014

Come on now.

Page 76: Monitoring Complex Systems - Chicago Erlang, 2014

What happened?

Page 77: Monitoring Complex Systems - Chicago Erlang, 2014

A jiffy bug.

Page 78: Monitoring Complex Systems - Chicago Erlang, 2014

A jiffy bug.

(we think)

Page 79: Monitoring Complex Systems - Chicago Erlang, 2014

Shout out to Miriam Pena for spending two weeks

tracking this down.

Page 80: Monitoring Complex Systems - Chicago Erlang, 2014

Okay, great. We have monitoring and instrumentation.

Page 81: Monitoring Complex Systems - Chicago Erlang, 2014

Now all our problems are solved, right?

Page 82: Monitoring Complex Systems - Chicago Erlang, 2014

Not quite.

Page 83: Monitoring Complex Systems - Chicago Erlang, 2014

Instruments make up for our lack of insight.

Page 84: Monitoring Complex Systems - Chicago Erlang, 2014

Monitoring makes up for our frailty.

Page 85: Monitoring Complex Systems - Chicago Erlang, 2014

Every solution brings its own problems.

Page 86: Monitoring Complex Systems - Chicago Erlang, 2014

Instruments may be misleading.

Page 87: Monitoring Complex Systems - Chicago Erlang, 2014

Instruments may be overwhelming.

Page 88: Monitoring Complex Systems - Chicago Erlang, 2014

Instruments may be

inaccurate.

Page 89: Monitoring Complex Systems - Chicago Erlang, 2014

Instruments may be ignored.

Page 90: Monitoring Complex Systems - Chicago Erlang, 2014

What can be done?

Page 91: Monitoring Complex Systems - Chicago Erlang, 2014

A little paranoia never hurt

anyone.

Page 92: Monitoring Complex Systems - Chicago Erlang, 2014

Use glass displays.

Page 93: Monitoring Complex Systems - Chicago Erlang, 2014

Train.

Page 94: Monitoring Complex Systems - Chicago Erlang, 2014

Keep sight of the main goal.

Page 95: Monitoring Complex Systems - Chicago Erlang, 2014

Have resources you’re willing to sacrifice.

Page 96: Monitoring Complex Systems - Chicago Erlang, 2014

AdRoll is Hiring! :D

Page 97: Monitoring Complex Systems - Chicago Erlang, 2014

Thanks, folks!<3

@bltroutwine