monitoring swift · monitoring swift openstack summit, austin 2016 adam takvam, sr. systems...
TRANSCRIPT
![Page 1: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/1.jpg)
MonitoringSwiftOpenStackSummit, Austin2016
AdamTakvam,Sr.SystemsEngineerMartinLanner,EngagementManager
April28,2016
![Page 2: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/2.jpg)
2 |SwiftStack Confidential
![Page 3: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/3.jpg)
3
Overview
• Problems- Usage intelligence- Capacityplanning- Operational health- Audittrails
• Background- Methods: logs+systemmetrics- Interpretation ofmetrics- Actions:thresholds +alerting
• Swiftkeymonitoring concepts- Whattomonitor?- Howtomonitor
• Monitoring methods - demos- Logging:ELK- Trending/Forecasting:
Prometheus +Grafana- Systemmonitoring:Zabbix
|SwiftStack Confidential
![Page 4: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/4.jpg)
4
It’sLinux!
|SwiftStack Confidential
![Page 5: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/5.jpg)
5
PropertiesofSwift
• Distributed system
• Extremelydurable through replicationorErasure Coding
• Nosinglepointoffailure
• Evendistributionofdata
• Resilient
• Self-healing capabilities
• Cantakealotofabuseandnegligence
![Page 6: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/6.jpg)
6
AnatomyofaMonitoringSolution
• Agent: Gathersmetricsonahostandeitherpushedoradvertisesthem- Logstash- PrometheusNodeExporter- ZabbixAgent- NagiosNRPE
• Aggregation Engines: Collects metrics fromagents andprovides an APIwith access toaggregated metric values- Nagios- Zabbix- Elasticsearch- Prometheus
• Visualizer: Renders graphs inahuman-friendlyformat for easy comprehension ofsystemstate- Kibana- Grafana
• Alerting: Uses metric thresholds totriggeralerts when metrics fall out ofan acceptablerange- AlertManager- PagerDuty
|SwiftStack Confidential
![Page 7: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/7.jpg)
7
FormsofMonitoring
• Systemutilization: CPU,memory,diskI/O,network,auditingcycles,replicatortiming
• Performance:Transaction latency
• Errors:Invalidrequests orstates
• Outages:Servicefailures
• Featureusage:Understand CRUDoperations andtrafficpatterns
• Audittrail:Whodidwhatwhen?
MonitoringLifecycle
• Measurement
• Reporting
• Characterization
• Thresholds
• Alerting
• Rootcauseanalysis
• Remediation- Manual- Automated
|SwiftStack Confidential
Developing aMonitoring Strategy
![Page 8: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/8.jpg)
8
Examplesofmonitoringmethods
• ELK: Usage intelligence- Who?- Agents- HTTPresponse codes- Errors- Audittrails
• Prometheus: Capacityplanning- Datagrowth- Trendinganalytics
• Zabbix: Operationalhealth- Network- CPU- RAM
![Page 9: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/9.jpg)
9
KeyconceptsformonitoringSwift
• Cluster full- df- Datagrowth- Capacityplanning
• Networking- Availability- Saturation
• Proxystate- CPU- /healthcheck
• Auditingcycles
• Replicationcycletiming
![Page 10: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/10.jpg)
10
LoadbalancerhealthchecksagainstSwiftproxyservers
demo@demo:~$ curl http://swift.swiftstack.oss/healthcheckOK
|SwiftStack Confidential
• Mostloadbalancers runICMPchecksagainstallIPsinitspoolbydefault
• Also,considerconfiguring theloadbalancer torunTCPchecksagainstSwift’s/healthcheck endpoint
Example:
![Page 11: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/11.jpg)
11
AudittrailswithELK
|SwiftStack Confidential
![Page 12: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/12.jpg)
12
Objectsizedistribution
|SwiftStack Confidential
![Page 13: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/13.jpg)
13
DistributionofCRUDoperationsovertime
|SwiftStack Confidential
![Page 14: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/14.jpg)
14
ZabbixtriggersforSwift
|SwiftStack Confidential
![Page 15: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/15.jpg)
15
Zabbixnodememoryusage
|SwiftStack Confidential
![Page 16: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/16.jpg)
16
Zabbixdriveutilizationevents
|SwiftStack Confidential
![Page 17: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/17.jpg)
17
DiskI/O
|SwiftStack Confidential
![Page 18: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/18.jpg)
18
ObjectReplicatorOperations
|SwiftStack Confidential
![Page 19: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/19.jpg)
19
Prometheus+Grafanatrendingandforecasting
|SwiftStack Confidential
![Page 20: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/20.jpg)
20
Alerting
ALERT StorageCritical24HoursIF sum(predict_linear(node_filesystem_free{
job='swiftstack',mountpoint=~"/srv/node/.*”}[1d]), 24*3600) < sum(node_filesystem_size{job="swiftstack",mountpoint=~"/srv/node/.*”}) * 0.2
FOR 1hLABELS {group="storage_admin“severity="critical“
}
|SwiftStack Confidential
Translation:Sendacriticalalerttoallmembersofthestorage_admin groupifthetotalavailablestoragecapacityisprojectedtobelessthan20%ofthetotalstoragecapacitywithinthenext24hoursandthatforecasthasheldtrueforatleast1hour,recalculatingevery5minutes(perserverconfig /notshown).
Example:
![Page 21: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/21.jpg)
21
Q&A/Demo
|SwiftStack Confidential
![Page 22: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016](https://reader030.vdocuments.us/reader030/viewer/2022040805/5e42fa0eb0324852346030b3/html5/thumbnails/22.jpg)
22
Thankyou!
|SwiftStack Confidential