performance analysis and troubleshooting using dtrace
Post on 12-Apr-2017
154 Views
Preview:
TRANSCRIPT
Performance analysis and troubleshooting using DTrace
“I keep six honest serving-men (They taught me all I knew);Their names are What and Why and When And How and Where and Who.” - Rudyard Kipling
Graeme Jenkinson - Computer Laboratory
Today’s lecture
Overview of tracing and profiling
DTrace: Dynamic Instrumentation of Production Systems
Performance analysis with DTrace - case study
Tracing and profiling on Linux
Wrap up and questions
Overview of tracing and profiling
• A profile provides an inventory of performance events and timings for the execution as a whole.• Sampling at a fixed time interval• Aggregated results - requests/per second• Absolute chronology of events is ignored
• A trace records the chronology, often with timestamps:• Absolute chronology of events is preserved• Trace increasing at runtime - must be periodically written out for processing• Can be used to construct a profile
Focus on tracing
Who did what to who?
04 Intuition libraryAddress of task
Tracing software as a black box
• print statements:
• printf(“Here”);
• #ifndef NDEBUGprintf(“State: Connected”);#endif
• Logging frameworks:
• info!(“State: connected”);
• logger.error(“Failed: to write file”);
Can declaratively turn on/off logging at compile time
Can declaratively turn on/off logging at runtime
Tracing software as a black box - summary
Print statements Logging framework
Pros• As simple as it gets• Declarative control at
compile time
• Declarative control at runtime or compile time
• Finer grained control of logging (per class/module)
• Structured output• Flexibility
Cons
• Adding tracing requires a recompile (in some case)
• Ad-hoc adoption• Inflexible (requiring
compile time decisions)
• Tiny bit more effort to adopt
• Added dependency to codebase
Logging can only get you so far
• Errors frequently the result of complex interactions across the software stack
• Logging limited to a single layer of the the software stack
• Logging frameworks don’t capture the events you want to see
Debuggers
• When logging really doesn’t cut it we often break out the debugger
• Set breakpoints, single step code, watch variables/registers
• Debuggers often struggle when facing certain types of problems:
• Heisenbug: “a software bug that seems to disappear or alter its behaviour when one attempts to study it"
Probe effects
• Probe effect - act of measuring disturbs system
• Electronics - probes introduce additional capacitance, resistance or inductance
• Software tracing - probes take time to execute
• Bugs from subtle timing effects arecommon, probe effects can rarelybe ignored
Observability/transparency
• Rather than treating the system as a black box we need observability into the system when executing
• What’s needed is more flexible less intrusive tools, thankful there are plenty of them
• Do these tools have a probe effect?
Observability/transparency tools
Observability tools: top
NAME
top - display Linux processes
top (table of processes) - reads kernel state from /proc filesystem (on Linux, FreeBSD); procfs is a pseudo-filesystem that presents information about processes and other system information in a hierarchical filesystem-like structure.
Note: procfs doesn’t come free (probe effect). What operations need to be performed?
Observability tools: strace (dtruss Mac OS X, truss FreeBSD)
Linux
strace - trace system calls and signals
Mac OS X (requires disabling System Integrity Protection - El Capitan+)
dtruss - process system call details. Uses DTrace.
FreeBSD
truss - trace system calls
Utilization, Saturation and Errors (USE)
• Utilization: the average time that the resource was busy servicing work
• Saturation: the degree to which the resource has extra work which it can't service, often queued
• Errors: the error count
• Resources:
• CPUs: sockets, cores, hardware threads (virtual CPUs)
• Memory: capacity
• Network interfaces
• Storage devices: I/O, capacity
• Controllers: storage, network cards
• Interconnects: CPUs, memory, I/O
Layered architectures and observability
• System performance problems are typically introduced at the highest layers of abstraction
• Effects are often first encountered and attributed at the lowest layers of abstraction
• Observability is required across all layers of the stack
In production analysis
• As software engineers we graduate from screwing up our own system to screwing up other peoples
• In production tracing must be safe
• Probe effect must be zero when disabled - “low” when tracing
DTrace: one tool to rule them all
• Developed by Bryan Cantrill, Mike Shapiro, and Adam Leventhal at Sun Microsystems
• Release in 2005 by Sun Microsystems
• “Performance analysis infrastructure has generally not kept pace with the shift to in-production performance analysis” - Cantrill et al
Illumos, FreeBSD, NetBSD, MacOS X
Linux (ish), Windows (yuck)
DTrace overview (Cantrill ’04)
• Dynamic instrumentation
• Zero probe effect when disabled
• Unified instrumentation
• Instruments both user and kernel level software
• Arbitrary-context kernel instrumentation
• Can instrument any aspect of the kernel including delicate sub-systems such as the scheduler and synchronisation facilities
As a result probes can’t block
What is the impact when enabled?
DTrace overview (cont)
• Data integrity
• Records may be dropped under load - but never silently
• High-level control language
• Scriptable “arbitrary” actions on probe firing
• Predicates that determine whether a probe fires
• Scripts executed by in-kernel virtual machine
• Production safe
• Aborts when system is heavily loaded
• Scalable in kernel aggregations (profiling)
How is safety preserved?
DTrace overview - anatomy of a dynamic tracer
Consumer
Compiler
fbt::vm_fault:entry { fault_count += 1;}
Probes
Userspace
Buffers (per-CPU)
Kernel
Results
D script
DTrace providers
• fbt: Function Boundary Tracing
• syscall: System Calls
• pid: User space processes
• proc: Process Operations
• sched: Scheduler
• Network Protocols
• IP, UDP, TCP
• lock: Kernel locking points
• io: I/O calls
• vfs: Filesystem Routines
• profile: Timing source
• mac_framework: Mandatory Access Control framework
DTrace providers (cont)
dtrace -l List all registered probes
dtrace -l -P fbt List probes registered by Function Boundary Tracing provider
dtrace -l -f vm_fault List probes for function vm_fault
provider:module:function:name
kernel moduleDTrace provider
function to trace
probe name
DTrace providers (cont)
• Statically added probes - added by developers
• Known useful points of instrumentation: starting new threads, TCP state machine transitions
• Dynamically added - probes automatically work out where to insert probes
• Static probes are stable instrumentation points, dynamic probes depend on the specifics of the implementation and therefore are subject to change
DTrace tracing example
syscall:::entry/execname == “ls”/{ printf(“syscall = %s\n”, probefunc);}
dtrace -c ls -n ‘…’
Predicate - only for the ls command
Action - Print the probes function Built in variable
DTrace profiling example
BEGIN { syscall_count = 0;}
syscall:::entry/execname == “ls”/{ syscall_count++;}
END { printf(“syscall_count = %d\n”, syscall_count);}
Initialize the count (global)
Print the count
Instrument all syscall entries
Only for the ls command
DTrace profiling example 2
syscall:::entry/execname == “ls”/{ t[probefunc] = timestamp;}
syscall:::return/t[probefunc] != 0/{ @funct[probefunc] = sum(timestamp - t[probefunc]); t[probefunc] = 0;}
Other useful profiling probes include profile-199 - why 199 seconds?
DTrace user land example 1
• pid (process ID) provider• Allow tracing of and arbitrary process such as a Web browser of
databases• Process does not need to be restarted (important in production)
dtrace -n ‘pid$target:::entry { trace(timestamp); }’ -c ls
dtrace -n ‘pid$target:::entry { trace(timestamp); }’ -p …
process id to trace
DTrace internals - overview
DTrace internals
• DTrace core framework
• Interfaces to userspace through ioctl() to /dev/dtrace/*
• Majority of the code (dtrace.c) is generic framework code
• Delegates instrumentation to providers
• Implements a virtual machine for the D Intermediate Format (DIF)
• Ensures safety of D scripts
• Analogous to Java bytecode
DTrace internals
• Provider
• Holds the knowledge about specific instrumentation techniques
• Enables probes when consumer says so (specific to each provider)
• Transfer control to DTrace framework by calling dtrace_probe(probe_id) somehow
• Then, DTrace framework can performs actions associated with probe
• dtrace_probe()
• Runs in specialized context known as probe context
• Interrupts are disabled for the CPU executing the probe
• Does no memory allocation and takes no locks
DTrace internals
• D compiler:
• Compiles D scripts to D Intermediate Format (DIF)
• Assembles DIF into DIFOs (for predicate and actions)
• Merges DIFOs into DOF (D Object Format)
• DIF has a RISC-like instruction set that supports D language constructs
dtrace -S -n 'syscall:freebsd:open:entry {printf("%s %s\n", execname, copyinstr(arg0)); }'
There’s no spec but we are working on it
DTrace internals
• Enabling Control Blocks (ECBs)
• Kernel data structure that represents an enabled probe
• Each ECB contains DIFO for predicates and actions associated with probe enabling
• Can be chained together if multiple consumers are interested in a given probe
Probe
ECB ECB
PredicateDIFO
ActionDIFO
ActionDIFO
DTrace internals - Function Boundary Tracing (FBT)
• sys_getpid, no DTrace
<sys_getpid> push %rbp<sys_getpid+1> mov %rsp,%rbp<sys_getpid+4> mov 0x8(%rdi),%rax<sys_getpid+8> movslq 0xbc(%rax),%rax
• sys_getpid, while running dtrace -n ‘fbt:kernel:sys_getpid:entry'• int3 replaces first sys_getpid instruction• push instruction is emulated in the kernel• original push instruction is restored after dtrace exits
<sys_getpid> int3<sys_getpid+1> mov %rsp,%rbp<sys_getpid+4> mov 0x8(%rdi),%rax<sys_getpid+8> movslq 0xbc(%rax),%rax
Used by debuggers to set breakpoints - architecture dependent
DTrace uses CTF Compressed Type Format) to identify function entry point
DTrace in the real world
• How many software engineers are there in the world working in systems?
• Boem estimates 700k work in “infrastructure”, which is a fairly broad definition
• How many people have used DTrace? ~1000s.
• How many people have written DTrace scripts? <1000
• How many people now how DTrace works in any detail? ~100
DTrace case study - Customers are not the enemy
• Scenario: Customer system reporting terrible performance
• NFS server
• SAN backend
• ~ 300 Mb/s
• 4 socket, 6 cores per socket
• Symptoms: Latency is “bad”, our customers are angry. Your product sucks
https://www.infoq.com/presentations/dtrace
Customers don’t report details like latency is twice average specified in service level agreement!
DTrace case study - Performance goals
• Why work isn’t being done?
• Thread not being schedule
• Queue of work isn’t being processed
• Get idle into the system
• Identify work being done inefficiently
• Identify where cycles are wasted doing the wrong thing
Useful DTrace probes of-cpu
DTrace case study - Measuring NFS latency
• Analysis: Develop DTrace scripts for measuring NFS latency
nfsv3:::op-read-start,nfsv3:::op-write-start{ self->ts = timestamp}
nfsv3:::op-read-done,nfsv3:::op-write-done{ @[probename == “op-write-done” ? “write” : “read” = quantize(timestamp - self.ts); self.ts = 0;}
quantize
DTrace case study
Synchronousnfsv3:::op-write-start{ self->sync = args[2]->stable != 0;}
Cache misssdt:::arc-miss,sdt:::blocked-read{ self->io = 1;}
Cached, synchronous and asynchronous read/writes very different performance - different populations
DTrace case study - synchronous write
• Average latency looks pretty good
DTrace case study - synchronous write
Average latency not so useful - outliers dominate
Linear Log
DTrace case study - summary
• Instrumentation with DTrace identified issues with…
• Throttling, inefficiencies in code and a heavily contented lock
• Lessons learnt
• Need to analyse performance on real-world systems
• Right tools, right questions
• Remedies without analysis don’t work and antagonise customers
• Systems shouldn't absorb all the pain
DTrace in the real world
• DTrace Toolkit
• http://www.brendangregg.com/dtracetoolkit.html
• DTrace one liners
• http://www.brendangregg.com/DTrace/dtrace_oneliners.txt
• https://wiki.freebsd.org/DTrace/One-Liners
One tool to rule them all…
• In 2017 in production means a distributed system
• Gap opened up once again between the types of in-production analysis we’d like to perform and available tools
• Distributed DTrace - see CADETS project
• Tracing internode flows?
• Volumes of data
• Security?
• What about heterogenous environments?
• Need to instrument Linux, Ilumos, FreeBSD environments seamlessly
Linux: One tool to rule them all… not so much
ftrace and perf
• ftrace
• Tracing utility built directly into the Linux kernel
# cd /sys/kernel/debug/tracing# cat available_tracers
• To enable tracer:
# echo function_graph > current_tracer
perf
• perf - instruments "events", which are a unified interface for different kernel instrumentation frameworks.
• Hardware Events: These instrument low-level processor activity based on CPU performance counters. For example, CPU cycles, instructions retired, memory stall cycles, level 2 cache misses, etc. Some will be listed as Hardware Cache Events.
• Software Events: These are low level events based on kernel counters. For example, CPU migrations, minor faults, major faults, etc.
perf (cont)
• perf - instruments "events", which are a unified interface for different kernel instrumentation frameworks.
• Tracepoint Events: This are kernel-level events based on the ftrace framework. These tracepoints are placed in interesting and logical locations of the kernel, so that higher-level behavior can be easily traced. For example, system calls, TCP events, file system I/O, disk I/O, etc. These are grouped into libraries of tracepoints; eg, "sock:" for socket events, "sched:" for CPU scheduler events.
• Dynamic Tracing: Software can be dynamically instrumented, creating events in any location. For kernel software, this uses the kprobes framework. For user-level software, uprobes.
• Timed Profiling: Snapshots can be collected at an arbitrary frequency, using perf record -FHz. This is commonly used for CPU usage profiling, and works by creating custom timed interrupt events.
SystemTap
• Scripting language and tool for dynamically instrumenting running production Linux kernel-based operating systems
• stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}'
virtual file system read
• “safe to use in production?” “yes”
https://sourceware.org/systemtap/wiki/SystemtapDtraceComparison
BPF (nee eBPF)
• Berkley Packet Filter (PBF)
• Interface to data link layers to capture raw packets (tcpdump)
• BPF virtual machine executes user scripts to filter packets
• JIT BPF instructions to native code - to improve performance
• Bytecode Probe Framework (maybe?)
• DTrace-like facilities attaching BPF scripts to tracepoints: kprobes, uprobes, tracepoints (soon)
• bcc front-end (and others)
• What can we do with BPF?
• “crazy stuff”
Designed for tracing 10GbE networking - highly performant
Walk stacks…
Bytecode Probe Framework
BPF
kprobes
uprobes
tracepoints
maps
KernelUserland
BPF bytecode
per-event data
statistics
generate load
async read
perf output
per-event output and in-kernel summarisation
Any questions?
Tracing tools quickly outperform log based analysis in production
Vast array of tracing tools in Linux and BSD-based operating systems
DTrace provides unified tracing with guaranteed safety in production
All tracing mechanisms introduce probe effects
Probe effects can change the very behaviour we are investigating
Measure (on the real system), analyse and remediate - don’t guess
DTrace on Linux isn’t production ready but other tools such as SystemTap and eBPF are roughly equivalent
top related