monitoring is never done
TRANSCRIPT
![Page 1: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/1.jpg)
Monitoring is Never “Done”
@melaniemj
![Page 2: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/2.jpg)
Responsibilities @ Yardi
Implementation and administration of monitoring, alerting, and log aggregation/analysis tools.
o 15,000+ Deviceso 9 Datacenterso 5000+ Customer Installationso We monitor windows envs with linux envs
![Page 3: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/3.jpg)
This was me in 2008 @ Point2
![Page 4: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/4.jpg)
How code is delivered
![Page 5: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/5.jpg)
How code operates in production
![Page 6: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/6.jpg)
A good problem to have
Everyone wants “the monitoring” so they can say “it’s monitored”
![Page 7: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/7.jpg)
Communicating Work
o Classify o Quantify o Qualify
![Page 8: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/8.jpg)
Words....
o Loggingo Alertingo Dashboards o Reportso 4-9so 24x7x365 this shit can’t go down
![Page 9: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/9.jpg)
Can it be this simple?
Let’s talk about “the monitoring” for X
Be awesome
X is monitored
![Page 10: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/10.jpg)
DCVA (OODA)
![Page 11: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/11.jpg)
1. Definition
I can hit this one page so it’s up right?
No thanks, let’s redefine status
![Page 12: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/12.jpg)
1. Definition
o What questions are you trying to answer?o What information do you need when a failure
occurs?o What are the most common failures?o Who is the audience for the information?
![Page 13: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/13.jpg)
2. Checks & Collections
o Environment & Codeo Data pointso Detailed logso Current state
![Page 14: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/14.jpg)
3. Visualization
o Analysiso Dashboardso Correlations
![Page 15: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/15.jpg)
4. Action
o Fault detection o Alertingo RCA
![Page 16: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/16.jpg)
![Page 17: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/17.jpg)
Cycle
(What to collect)
(Inform on failure) (How to collect)
(Make collections pretty)
![Page 18: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/18.jpg)
Team Time Distribution
![Page 19: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/19.jpg)
Time Distribution (Desired)
![Page 20: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/20.jpg)
Is “X” monitored?
When “X” goes into some degraded stateo The right people know.
o They have enough information to find the problem, recover, and later to do RCA.
o If they don’t they will revisit definition.
![Page 21: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/21.jpg)
How does your team
o Classify o Quantify o Qualify
![Page 22: Monitoring Is Never Done](https://reader030.vdocuments.us/reader030/viewer/2022032505/55c51e73bb61eba17d8b460d/html5/thumbnails/22.jpg)
Monitoring is Never “Done”
Melanie Cey @melaniemj
Senior Systems AnalystSystems Reliability Engineering @ Yardi