lab: jvm production debugging 101
DESCRIPTION
A lab given at the Reversim Summit on 19 February 2013. http://summit2013.reversim.com/#/sessions/Lab:%20Java%20Production%20Debugging%20101 The code for the sample scenarios can be found on GitHub: https://github.com/holograph/examples/tree/master/reversim-proddbg-labTRANSCRIPT
![Page 1: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/1.jpg)
Java Production Debugging 101A Reversim Summit Lab, February, 2013
![Page 2: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/2.jpg)
PRODUCTION DEBUGGING
= FORENSICS
![Page 3: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/3.jpg)
Business Requirements
Requirements
Prod. Debugging Forensics
Timeframe Severely limited
Hours, days, weeks…
Chain of Custody Meaningless Sacred
Documentation Useful Sacred
![Page 4: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/4.jpg)
Endgame
Production Debugging Forensics
1. Gather evidence1. Identify crime in progress
2. Restore functionality 2. Gather evidence
3. Figure out what happened
![Page 5: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/5.jpg)
Our Forensic Process
Gather Evidence
Restore Production
Analyze Findings
Implement Solution
Post-Mortem
![Page 6: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/6.jpg)
Evidence toolchain
![Page 7: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/7.jpg)
WHAT SHALL WE COLLECT?
![Page 8: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/8.jpg)
Our focus points for today
• Thread dump• Heap dump• VM (especially GC) metrics• System metrics• Logs
![Page 9: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/9.jpg)
jstack
• Minimalistic tool• Against a running process:jstack <pid>
• Outputs to stdout• Identifies deadlocks
![Page 10: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/10.jpg)
jmap
• Heap-dump from a running process– Lengthy process– Freezes VM
• Some extras• Command:
jmap –dump:format=b,file=<output> <pid>
![Page 11: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/11.jpg)
jstat
• JVM metrics: classloader, JIT, GC• Tracking over time• Console-based• jstat –gcutil <pid> 5s
![Page 12: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/12.jpg)
The JVM GC
![Page 13: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/13.jpg)
jvisualvm
• Combines most of the above, with GUI
• Remote via X11 forwarding (dreadful!)
![Page 14: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/14.jpg)
SHALL WE DANCE?So…
![Page 15: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/15.jpg)
Scenario 1
• Phone call in the middle of the night– “The application is stuck!”
• What do you do?
![Page 16: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/16.jpg)
Scenario 2
• Looks familiar?– “The application is
crawling to a halt!”– “So restart it.”– “OK, it’s good
now.”
• This is a lie.– You will get
another call.
![Page 17: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/17.jpg)
Scenario 3
• 1st tier support engineer (maybe you?) calls:– “I get OutOfMemoryExceptions on
this service.”– “Restart it.”– “Already have. Happened again.”– “Well, shit.”
![Page 18: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/18.jpg)
BREAK TIME!
![Page 19: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/19.jpg)
FORENSICTOOLCHAIN
Without further ado…
![Page 20: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/20.jpg)
GNU toolchain is your friend
• bash, ps, grep, less, awk– ‘nuff said
• … or:– http://gnuwin32.sourceforge.net/
![Page 21: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/21.jpg)
MAT
• Eclipse plugin/standalone
• Reads heap dumps
• Easy drill-down
![Page 22: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/22.jpg)
And most important…
![Page 23: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/23.jpg)
RESOLUTION TIME!
![Page 24: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/24.jpg)
Back to: Scenario 1
• What did we gather?– CPU – 100% single-core utilization– GC metrics – no useful data– Heap dump – no useful data– Thread dump
• java.util.Regex * gazillion
• Where the problem is implies… what the problem is
![Page 25: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/25.jpg)
Back to: Scenario 2
• What did we gather?– CPU – 100% single-core utilization– Heap dump – no useful data– Thread dump– GC metrics
• Frequent, long GCs (GC, FGC, FGCT)
• Rapid HashMap insertions: recipe for disaster
![Page 26: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/26.jpg)
Back to: Scenario 3
• What did we gather?– CPU – low utilization– Thread dump – no useful data– GC metrics – high heap utilization,
low GC – Heap dump
• Predictably high number of strings• Strings are abnormally large• Strings contain entire HTML subset!
• Substring/regex can be dangerous!
![Page 27: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/27.jpg)
AFTERWORDHeadache? Take two of these!
![Page 28: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/28.jpg)
Adieu
• Thank you for attending!
• Presentation and demos:
http://git.io/7LK4fw
• Tomer Gabel– [email protected]– http://www.tomergabel.com/– @tomerg
![Page 29: Lab: JVM Production Debugging 101](https://reader035.vdocuments.us/reader035/viewer/2022081515/554f6c07b4c9058a148b4f86/html5/thumbnails/29.jpg)
Thank youour sponsors