![Page 1: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/1.jpg)
Common Sense Performance Indicators
Nick GernerJune 24, 2010
![Page 2: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/2.jpg)
Goals
Common Sense in the Cloudsame as outside the cloud
1. Tune performance2. Investigate issues3. Visualize architecture
![Page 3: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/3.jpg)
Nick Gerner
www.nickgerner.com@gerner
• Formerly senior engineer at SEOmoz• Linkscape: index of the web for SEO• Lead data services• Developer• Back-end ops guy
![Page 4: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/4.jpg)
SEOmoz
• Seattle-based Startup (~7 engineers)• SEO Blog and Community• Toolset and Platform
OpenSiteExplorer.org
• 300TB/month processing pipeline• 5 mil req/day API hits
![Page 5: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/5.jpg)
SEOmoz Engineering
• 50 < nodes < 500• AWS based since 2008
– EC2 – linux root access to bare VM– S3 – networked disk– EBS – local disk I/O– ELB – load balancing as a service
![Page 6: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/6.jpg)
SEOmoz ArchitectureProcessing
TheWeb Crawlers
RawStorage
Crawlers Process Prepare
Data Pipeline
![Page 7: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/7.jpg)
SEOmoz ArchitectureAPI
Partners
SEOmozApps
ELB
Lighttpd
Lighttpd
Lighttpd
App
App
App
Memcache
Memcache
Memcache
Memcache
Memcache
Memcache
S3
![Page 8: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/8.jpg)
End-to-EndPerformance Indicators
Latency
Time toOn-load
Conversion Rate
DNS
Web ObjectCount
![Page 9: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/9.jpg)
Great...but not the focus of this talk
Latency
Time toOn-load
Conversion Rate
DNS
Web ObjectCount
![Page 10: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/10.jpg)
Performance Indicators
SystemCharacteristics
AppStack
Database WS-API
Back-end
Caching
Middleware
Front-End
Drives
CompetesFor
http://www.flickr.com/photos/dnisbet/3118888630/
CPU
DiskNet
Mem
![Page 11: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/11.jpg)
Performance Indicators
SystemCharacteristics App
Stack
Database WS-API
Back-end
Caching
Middleware
Front-End
Drives
CompetesFor
http://www.flickr.com/photos/dnisbet/3118888630/
CPU
DiskNet
Mem
![Page 12: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/12.jpg)
/proc
• System stats• Per-process stats• It all comes from here
...but use tools to see it
![Page 13: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/13.jpg)
System Characteristics
Load AverageCPU
MemoryDisk
Network
![Page 14: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/14.jpg)
Load Average
• Combines a few things• Good place to start• Explains nothing
http://www.flickr.com/photos/maple03/4176389418/
![Page 15: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/15.jpg)
CPU
• Break out by process• Break out user vs system• User, System, I/O wait, Idle
http://www.flickr.com/photos/pacdog/213442876/
![Page 16: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/16.jpg)
Why watch it?
• Who's doing work• Is CPU maxed?• Blocked on I/O?• Compare to Load Average
http://www.flickr.com/photos/pacdog/213442876/
![Page 17: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/17.jpg)
Memory
• Break out by Process• Free, cached, used
http://www.flickr.com/photos/williamhook/3118248600/
![Page 18: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/18.jpg)
Why watch it?• Cached + Free = Available• Do you have spare memory?
– App uses– Memcache– DB cache
http://www.flickr.com/photos/williamhook/3118248600/
![Page 19: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/19.jpg)
Disk
• Read bytes/sec• Write bytes/sec• Disk utilization
http://www.flickr.com/photos/robfon/2174992215/
![Page 20: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/20.jpg)
Why watch it?
• Is disk busy?• When?• Who's using it?
http://www.flickr.com/photos/robfon/2174992215/
![Page 21: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/21.jpg)
Network
• Read bytes/sec• Write bytes/sec• Established connections
http://www.flickr.com/photos/ahkitj/20853609/
![Page 22: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/22.jpg)
Why watch it?
• Max connections(~1024 is magic)
• Bandwidth is $$$• When are you busy?• SOA considerations
http://www.flickr.com/photos/ahkitj/20853609/
![Page 23: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/23.jpg)
Perf Monitoring Solution
1. data collection (collectd)2. data storage (rrdtool)3. dashboard management (drraw)
FREE, in Aptv
![Page 24: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/24.jpg)
Perf Monitoring Architecture
Cluster
Cluster
Multiple Clusters
Multiple Applications
Nodes come upand go down
![Page 25: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/25.jpg)
Perf Monitoring Architecture
Cluster
Cluster
collectd agents
new nodes getgeneric config
node namesfollow conventionaccording to role
![Page 26: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/26.jpg)
Perf Monitoring Architecture
Perf MonitoringServer
Cluster
Cluster
On its own server:collectd server
Web serverdrraw.cgi
allows connections from new nodes
perf data backed up daily
![Page 27: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/27.jpg)
Perf Monitoring Architecture
Perf MonitoringServer
Cluster
Cluster
Happy Sysadmin
Visibility into systemhistory of performance
![Page 28: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/28.jpg)
Perf Dashboard Featurs
1. Summarize nodes/systems2. Visualize data over time3. Stack measurements– Per-process– Per-node
4. Handle new nodes–
![Page 29: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/29.jpg)
Batch Mode Dashboard
![Page 30: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/30.jpg)
CPU
![Page 31: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/31.jpg)
Memory
![Page 32: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/32.jpg)
Disk
![Page 33: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/33.jpg)
Network
![Page 34: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/34.jpg)
Web Server Dashboard
![Page 35: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/35.jpg)
Web Requests
![Page 36: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/36.jpg)
mod_status
![Page 37: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/37.jpg)
System-Wide Dashboard
![Page 38: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/38.jpg)
Per-request
![Page 39: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/39.jpg)
Graph Summary
• cpu, mem, disk, net• over time• per node• per process• Through in relevant app measures
e.g. per request stats:• req/sec• median latency/req
![Page 40: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/40.jpg)
Ad-hoc Tools
• $ dstat -cdnmlsystem characteristics
• $ iotopper-process disk I/O
• $ iostat -x 3detailed disk stats
• $ netstat -tnpfast, per-process TCP connection stats
![Page 41: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/41.jpg)
Resources
• Perf Testing: What, How, Whyhttp://www.nickgerner.com/2010/02/performance-testing-what-andhow-why/
• Perf Testing Case Study: OSEhttp://www.nickgerner.com/2010/01/performance-testing-case-study-ose/
• S3 Benchmarkshttp://twopieceset.blogspot.com/2009/06/s3-performance-benchmarks.html
• Perf Measurement– http://twopieceset.blogspot.com/2009/03/performance-
measurement-for-small-and.html
–
![Page 42: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/42.jpg)
More Resources
• http://www.collectd.org• http://oss.oetiker.ch/rrdtool/• http://web.taranis.org/drraw/• http://dag.wieers.com/home-made/dstat/
• $ man proc–
![Page 43: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/43.jpg)
Q: Why? A: Perf Tuning
Test
InterpretImprove
Validate Measure
![Page 44: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/44.jpg)
Q: Why? A: System Arch
• Better Devs/Ops• Identify Bottlenecks• Scaling
Considerations
![Page 45: Common Sense Performance Indicators in the Cloud](https://reader034.vdocuments.us/reader034/viewer/2022052618/54b2f5684a795972088b461b/html5/thumbnails/45.jpg)
Q: Why? A: Issue Investigation
• Machine Specific?• System Wide?• Which Component?• Timeline?• Cascading Failures?