common sense performance indicators in the cloud

45
Common Sense Performance Indicators Nick Gerner June 24, 2010

Upload: nick-gerner

Post on 12-Jan-2015

3.449 views

Category:

Technology


1 download

DESCRIPTION

Nick Gerner speaks about performance indicators and measurement tools for Velocity 2010

TRANSCRIPT

Page 1: Common Sense Performance Indicators in the Cloud

Common Sense Performance Indicators

Nick GernerJune 24, 2010

Page 2: Common Sense Performance Indicators in the Cloud

Goals

Common Sense in the Cloudsame as outside the cloud

1. Tune performance2. Investigate issues3. Visualize architecture

Page 3: Common Sense Performance Indicators in the Cloud

Nick Gerner

www.nickgerner.com@gerner

• Formerly senior engineer at SEOmoz• Linkscape: index of the web for SEO• Lead data services• Developer• Back-end ops guy

Page 4: Common Sense Performance Indicators in the Cloud

SEOmoz

• Seattle-based Startup (~7 engineers)• SEO Blog and Community• Toolset and Platform

OpenSiteExplorer.org

• 300TB/month processing pipeline• 5 mil req/day API hits

Page 5: Common Sense Performance Indicators in the Cloud

SEOmoz Engineering

• 50 < nodes < 500• AWS based since 2008

– EC2 – linux root access to bare VM– S3 – networked disk– EBS – local disk I/O– ELB – load balancing as a service

Page 6: Common Sense Performance Indicators in the Cloud

SEOmoz ArchitectureProcessing

TheWeb Crawlers

RawStorage

Crawlers Process Prepare

Data Pipeline

Page 7: Common Sense Performance Indicators in the Cloud

SEOmoz ArchitectureAPI

Partners

SEOmozApps

ELB

Lighttpd

Lighttpd

Lighttpd

App

App

App

Memcache

Memcache

Memcache

Memcache

Memcache

Memcache

S3

Page 8: Common Sense Performance Indicators in the Cloud

End-to-EndPerformance Indicators

Latency

Time toOn-load

Conversion Rate

DNS

Web ObjectCount

Page 9: Common Sense Performance Indicators in the Cloud

Great...but not the focus of this talk

Latency

Time toOn-load

Conversion Rate

DNS

Web ObjectCount

Page 10: Common Sense Performance Indicators in the Cloud

Performance Indicators

SystemCharacteristics

AppStack

Database WS-API

Back-end

Caching

Middleware

Front-End

Drives

CompetesFor

http://www.flickr.com/photos/dnisbet/3118888630/

CPU

DiskNet

Mem

Page 11: Common Sense Performance Indicators in the Cloud

Performance Indicators

SystemCharacteristics App

Stack

Database WS-API

Back-end

Caching

Middleware

Front-End

Drives

CompetesFor

http://www.flickr.com/photos/dnisbet/3118888630/

CPU

DiskNet

Mem

Page 12: Common Sense Performance Indicators in the Cloud

/proc

• System stats• Per-process stats• It all comes from here

...but use tools to see it

Page 13: Common Sense Performance Indicators in the Cloud

System Characteristics

Load AverageCPU

MemoryDisk

Network

Page 14: Common Sense Performance Indicators in the Cloud

Load Average

• Combines a few things• Good place to start• Explains nothing

http://www.flickr.com/photos/maple03/4176389418/

Page 15: Common Sense Performance Indicators in the Cloud

CPU

• Break out by process• Break out user vs system• User, System, I/O wait, Idle

http://www.flickr.com/photos/pacdog/213442876/

Page 16: Common Sense Performance Indicators in the Cloud

Why watch it?

• Who's doing work• Is CPU maxed?• Blocked on I/O?• Compare to Load Average

http://www.flickr.com/photos/pacdog/213442876/

Page 17: Common Sense Performance Indicators in the Cloud

Memory

• Break out by Process• Free, cached, used

http://www.flickr.com/photos/williamhook/3118248600/

Page 18: Common Sense Performance Indicators in the Cloud

Why watch it?• Cached + Free = Available• Do you have spare memory?

– App uses– Memcache– DB cache

http://www.flickr.com/photos/williamhook/3118248600/

Page 19: Common Sense Performance Indicators in the Cloud

Disk

• Read bytes/sec• Write bytes/sec• Disk utilization

http://www.flickr.com/photos/robfon/2174992215/

Page 20: Common Sense Performance Indicators in the Cloud

Why watch it?

• Is disk busy?• When?• Who's using it?

http://www.flickr.com/photos/robfon/2174992215/

Page 21: Common Sense Performance Indicators in the Cloud

Network

• Read bytes/sec• Write bytes/sec• Established connections

http://www.flickr.com/photos/ahkitj/20853609/

Page 22: Common Sense Performance Indicators in the Cloud

Why watch it?

• Max connections(~1024 is magic)

• Bandwidth is $$$• When are you busy?• SOA considerations

http://www.flickr.com/photos/ahkitj/20853609/

Page 23: Common Sense Performance Indicators in the Cloud

Perf Monitoring Solution

1. data collection (collectd)2. data storage (rrdtool)3. dashboard management (drraw)

FREE, in Aptv

Page 24: Common Sense Performance Indicators in the Cloud

Perf Monitoring Architecture

Cluster

Cluster

Multiple Clusters

Multiple Applications

Nodes come upand go down

Page 25: Common Sense Performance Indicators in the Cloud

Perf Monitoring Architecture

Cluster

Cluster

collectd agents

new nodes getgeneric config

node namesfollow conventionaccording to role

Page 26: Common Sense Performance Indicators in the Cloud

Perf Monitoring Architecture

Perf MonitoringServer

Cluster

Cluster

On its own server:collectd server

Web serverdrraw.cgi

allows connections from new nodes

perf data backed up daily

Page 27: Common Sense Performance Indicators in the Cloud

Perf Monitoring Architecture

Perf MonitoringServer

Cluster

Cluster

Happy Sysadmin

Visibility into systemhistory of performance

Page 28: Common Sense Performance Indicators in the Cloud

Perf Dashboard Featurs

1. Summarize nodes/systems2. Visualize data over time3. Stack measurements– Per-process– Per-node

4. Handle new nodes–

Page 29: Common Sense Performance Indicators in the Cloud

Batch Mode Dashboard

Page 30: Common Sense Performance Indicators in the Cloud

CPU

Page 31: Common Sense Performance Indicators in the Cloud

Memory

Page 32: Common Sense Performance Indicators in the Cloud

Disk

Page 33: Common Sense Performance Indicators in the Cloud

Network

Page 34: Common Sense Performance Indicators in the Cloud

Web Server Dashboard

Page 35: Common Sense Performance Indicators in the Cloud

Web Requests

Page 36: Common Sense Performance Indicators in the Cloud

mod_status

Page 37: Common Sense Performance Indicators in the Cloud

System-Wide Dashboard

Page 38: Common Sense Performance Indicators in the Cloud

Per-request

Page 39: Common Sense Performance Indicators in the Cloud

Graph Summary

• cpu, mem, disk, net• over time• per node• per process• Through in relevant app measures

e.g. per request stats:• req/sec• median latency/req

Page 40: Common Sense Performance Indicators in the Cloud

Ad-hoc Tools

• $ dstat -cdnmlsystem characteristics

• $ iotopper-process disk I/O

• $ iostat -x 3detailed disk stats

• $ netstat -tnpfast, per-process TCP connection stats

Page 41: Common Sense Performance Indicators in the Cloud

Resources

• Perf Testing: What, How, Whyhttp://www.nickgerner.com/2010/02/performance-testing-what-andhow-why/

• Perf Testing Case Study: OSEhttp://www.nickgerner.com/2010/01/performance-testing-case-study-ose/

• S3 Benchmarkshttp://twopieceset.blogspot.com/2009/06/s3-performance-benchmarks.html

• Perf Measurement– http://twopieceset.blogspot.com/2009/03/performance-

measurement-for-small-and.html

Page 42: Common Sense Performance Indicators in the Cloud

More Resources

• http://www.collectd.org• http://oss.oetiker.ch/rrdtool/• http://web.taranis.org/drraw/• http://dag.wieers.com/home-made/dstat/

• $ man proc–

Page 43: Common Sense Performance Indicators in the Cloud

Q: Why? A: Perf Tuning

Test

InterpretImprove

Validate Measure

Page 44: Common Sense Performance Indicators in the Cloud

Q: Why? A: System Arch

• Better Devs/Ops• Identify Bottlenecks• Scaling

Considerations

Page 45: Common Sense Performance Indicators in the Cloud

Q: Why? A: Issue Investigation

• Machine Specific?• System Wide?• Which Component?• Timeline?• Cascading Failures?