intro to linux performance analysis
DESCRIPTION
LOPSA SD 2014.03.27 Presentation on Linux Performance Analysis An introduction using the USE method and showing how several tools fit into those resource evaluations.TRANSCRIPT
Intro to Linux Performance
AnalysisChris McEniry
LOPSA-SD March 27, 2014
Me
• Systems Architect
• Sony Network Entertainment
• 18 years running stuff
• Majority of the last 14 years: medium-large Internet services
Read this book…
And look here:
http://www.brendangregg.com/
http://www.brendangregg.com/methodology.html
http://www.brendangregg.com/Slides/LISA2012_methodologies.pdf
http://www.amazon.com/Systems-Performance-Enterprise-Brendan-Gregg/dp/0133390098
The website is down!!! It’s just too slow! The DB is too slow! The disk is too slow!
SLOW!!!
http://farm4.staticflickr.com/3190/2976755407_6a6a574596_o.jpg
SLOW!!!!
• What does slow mean anyways?
• Is it not transferring fast enough?
• Is it handling (not) too many requests?
http://commons.wikimedia.org/wiki/File:United_States_sign_-_Slow_Traffic_Ahead.svg
Slow can mean…
• Latency: How long it takes
• ms, s, request time, etc
• Throughput: How much can happen at the same time
• bandwidth, IOPS, rps, tps, etc
http://upload.wikimedia.org/wikipedia/commons/2/2e/Miniature_DNF_Dictionary_055_ubt.JPG
Slowness comes from…
• Full utilization of a resource
• Waiting in a saturated queue
• Generated errors!
!
• The USE Method
http://farm6.staticflickr.com/5181/5614813544_a30d693a50_o.jpg
Utilization
• You have fully used up what’s been allocated
• aka 5 lb bag
http://farm3.staticflickr.com/2524/4000641774_3331fe06fb_o.jpg
Saturation
• Waiting for someone else to get done so you can do yours
• Typically because a resource is fully utilized, but not necessarily directly
http://www.fotocommunity.com/pc/pc/display/30396619
Errors
• Dropped packets
• Incorrect responses
• Deadlocks
• Timeouts
!
• Not all failures fail fast
http://farm8.staticflickr.com/7001/6509400855_aaaf915871_b.jpg
How do we determine?
• Different types of tools for different examinations
• Depends on what you’re looking for (which can be a problem in and of itself)
http://farm5.staticflickr.com/4083/5086955738_61f6455ace_b.jpg
Resource vs Transaction• Do you care if…
• a CPU is maxed out?
• processes are blocked?
• packets are lost?
• or if…
• a user’s request fails?
• a user gives up on waiting for a response?
Maturity
• Tracing tools, especially using in production, requires a level of maturity
• I’m not that mature… ;)
• No, really just focusing on the basics first
http://upload.wikimedia.org/wikipedia/commons/b/bd/OFLC_large_R18%2B.svg
http://image.slidesharecdn.com/scalelinuxperformance-130224171331-phpapp01/95/slide-15-638.jpg?cb=1362166290
http://image.slidesharecdn.com/scalelinuxperformance-130224171331-phpapp01/95/slide-16-638.jpg?cb=1362166290
General
?
/var/log/messages
Errors !(mostly - sometimes stats go here)
/var/log/messages
CPU
?
uptime
Saturation of the scheduler
uptime
?
top
topSaturation
Utilization
Memory
?
free
Utilization
free
?
vmstat
vmstat
SaturationUtilization
Counts
?
slabtop
Utilization
slabtop
Disk
?
df
Utilization
df
?
iostat -x
Maybe you can get additional utilization if you know the max r/s or w/s - but not as clear based on different properties.
iostat -x
IO (Network)
?
ping
Errors
ping
?
netstat
Saturation
netstat
?
netstat -s
Errors
netstat -s
?
ifconfig
ifconfigSaturation
UtilizationErrors
What are your examples?
http://upload.wikimedia.org/wikipedia/commons/f/f3/Uncle_Sam_(pointing_finger).jpg
Applications
Running out of Apache Threads
• Lots of incoming requests
• Apache hits ServerLimit of threads (Utilization!)
• Requests start to get stuck in TCP backlog (Saturation!)
• Apache endpoints are removed from load balancers (Error!)
• Fail!
http://upload.wikimedia.org/wikipedia/commons/9/96/Colorful_Threads_(3965274345).jpg
Cold DB Start• DB’s like to be in memory, but
can’t start that way
• All data requests go to disk (which is SAN backed)
• SAN controller CPU gets maxed out (Utilization!)
• HBA queues get deep (Saturation!)
• Requests timeout (Error!)
• Fail!
Summary
Methods > Tools
• Don’t let tools get in the way of solutions
• It’s easy to think that all your missing a tool.
• But are you actually following a method to your performance madness?
http://upload.wikimedia.org/wikipedia/commons/6/6d/Three_Card_Monte.jpg
Anti-Methods• Blame Someone Else
• Streetlight
• Drunk Man
• Random Change
• Passive Benchmark
!
• Don’t do these…
http://www.brendangregg.com/methodology.html http://upload.wikimedia.org/wikipedia/commons/a/af/Villainc.svg
Methods• Ad Hoc Checklist
• Problem Statement
• Scientific
• Workload Characterization
• Drill-down Analysis
• By-layer
• Latency Analysis
• Tools
• Stack Profile
• Off-CPU Analysis
• Thread State Analysis
• Active Benchmarkhttp://www.brendangregg.com/methodology.html http://memegenerator.net/instance/9192015
Linux Performance Tools
Chris McEniry LOPSA-SD
March 27, 2014