performance analysis and troubleshooting using dtrace

Performance analysis and troubleshooting using DTrace

“I keep six honest serving-men (They taught me all I knew);Their names are What and Why and When And How and Where and Who.” - Rudyard Kipling

Graeme Jenkinson - Computer Laboratory

Today’s lecture

Overview of tracing and profiling

DTrace: Dynamic Instrumentation of Production Systems

Performance analysis with DTrace - case study

Tracing and profiling on Linux

Wrap up and questions

Overview of tracing and profiling

• A profile provides an inventory of performance events and timings for the execution as a whole.• Sampling at a fixed time interval• Aggregated results - requests/per second• Absolute chronology of events is ignored

• A trace records the chronology, often with timestamps:• Absolute chronology of events is preserved• Trace increasing at runtime - must be periodically written out for processing• Can be used to construct a profile

Focus on tracing

Who did what to who?

04 Intuition libraryAddress of task

Tracing software as a black box

• print statements:

• printf(“Here”);

• #ifndef NDEBUGprintf(“State: Connected”);#endif

• Logging frameworks:

• info!(“State: connected”);

• logger.error(“Failed: to write file”);

Can declaratively turn on/off logging at compile time

Can declaratively turn on/off logging at runtime

Tracing software as a black box - summary

Print statements Logging framework

Pros• As simple as it gets• Declarative control at

compile time

• Declarative control at runtime or compile time

• Finer grained control of logging (per class/module)

• Structured output• Flexibility

Cons

• Adding tracing requires a recompile (in some case)

• Ad-hoc adoption• Inflexible (requiring

compile time decisions)

• Tiny bit more effort to adopt

• Added dependency to codebase

Logging can only get you so far

• Errors frequently the result of complex interactions across the software stack

• Logging limited to a single layer of the the software stack

• Logging frameworks don’t capture the events you want to see

Debuggers

• When logging really doesn’t cut it we often break out the debugger

• Set breakpoints, single step code, watch variables/registers

• Debuggers often struggle when facing certain types of problems:

• Heisenbug: “a software bug that seems to disappear or alter its behaviour when one attempts to study it"

https://en.wikipedia.org/wiki/Software_bug

Probe effects

• Probe effect - act of measuring disturbs system

• Electronics - probes introduce additional capacitance, resistance or inductance

• Software tracing - probes take time to execute

• Bugs from subtle timing effects arecommon, probe effects can rarelybe ignored

Observability/transparency

• Rather than treating the system as a black box we need observability into the system when executing

• What’s needed is more flexible less intrusive tools, thankful there are plenty of them

• Do these tools have a probe effect?

Observability/transparency tools

Observability tools: top

NAME

top - display Linux processes

top (table of processes) - reads kernel state from /proc filesystem (on Linux, FreeBSD); procfs is a pseudo-filesystem that presents information about processes and other system information in a hierarchical filesystem-like structure.

Note: procfs doesn’t come free (probe effect). What operations need to be performed?

Observability tools: strace (dtruss Mac OS X, truss FreeBSD)

Linux

strace - trace system calls and signals

Mac OS X (requires disabling System Integrity Protection - El Capitan+)

dtruss - process system call details. Uses DTrace.

FreeBSD

truss - trace system calls

Utilization, Saturation and Errors (USE)

• Utilization: the average time that the resource was busy servicing work

• Saturation: the degree to which the resource has extra work which it can't service, often queued

• Errors: the error count

• Resources:

• CPUs: sockets, cores, hardware threads (virtual CPUs)

• Memory: capacity

• Network interfaces

• Storage devices: I/O, capacity

• Controllers: storage, network cards

• Interconnects: CPUs, memory, I/O

Layered architectures and observability

• System performance problems are typically introduced at the highest layers of abstraction

• Effects are often first encountered and attributed at the lowest layers of abstraction

• Observability is required across all layers of the stack

In production analysis

• As software engineers we graduate from screwing up our own system to screwing up other peoples

• In production tracing must be safe

• Probe effect must be zero when disabled - “low” when tracing

DTrace: one tool to rule them all

• Developed by Bryan Cantrill, Mike Shapiro, and Adam Leventhal at Sun Microsystems

• Release in 2005 by Sun Microsystems

• “Performance analysis infrastructure has generally not kept pace with the shift to in-production performance analysis” - Cantrill et al

Illumos, FreeBSD, NetBSD, MacOS X

Linux (ish), Windows (yuck)

DTrace overview (Cantrill ’04)

• Dynamic instrumentation

• Zero probe effect when disabled

• Unified instrumentation

• Instruments both user and kernel level software

• Arbitrary-context kernel instrumentation

• Can instrument any aspect of the kernel including delicate sub-systems such as the scheduler and synchronisation facilities

As a result probes can’t block

What is the impact when enabled?

DTrace overview (cont)

• Data integrity

• Records may be dropped under load - but never silently

• High-level control language

• Scriptable “arbitrary” actions on probe firing

• Predicates that determine whether a probe fires

• Scripts executed by in-kernel virtual machine

• Production safe

• Aborts when system is heavily loaded

• Scalable in kernel aggregations (profiling)

How is safety preserved?

DTrace overview - anatomy of a dynamic tracer

Consumer

Compiler

fbt::vm_fault:entry { fault_count += 1;}

Probes

Userspace

Buffers (per-CPU)

Kernel

Results

D script

DTrace providers

• fbt: Function Boundary Tracing

• syscall: System Calls

• pid: User space processes

• proc: Process Operations

• sched: Scheduler

• Network Protocols

• IP, UDP, TCP

• lock: Kernel locking points

• io: I/O calls

• vfs: Filesystem Routines

• profile: Timing source

• mac_framework: Mandatory Access Control framework

DTrace providers (cont)

dtrace -l List all registered probes

dtrace -l -P fbt List probes registered by Function Boundary Tracing provider

dtrace -l -f vm_fault List probes for function vm_fault

provider:module:function:name

kernel moduleDTrace provider

function to trace

probe name

DTrace providers (cont)

• Statically added probes - added by developers

• Known useful points of instrumentation: starting new threads, TCP state machine transitions

• Dynamically added - probes automatically work out where to insert probes

• Static probes are stable instrumentation points, dynamic probes depend on the specifics of the implementation and therefore are subject to change

DTrace tracing example

syscall:::entry/execname == “ls”/{ printf(“syscall = %s\n”, probefunc);}

dtrace -c ls -n ‘…’

Predicate - only for the ls command

Action - Print the probes function Built in variable

DTrace profiling example

BEGIN { syscall_count = 0;}

syscall:::entry/execname == “ls”/{ syscall_count++;}

END { printf(“syscall_count = %d\n”, syscall_count);}

Initialize the count (global)

Print the count

Instrument all syscall entries

Only for the ls command

DTrace profiling example 2

syscall:::entry/execname == “ls”/{ t[probefunc] = timestamp;}

syscall:::return/t[probefunc] != 0/{ @funct[probefunc] = sum(timestamp - t[probefunc]); t[probefunc] = 0;}

Other useful profiling probes include profile-199 - why 199 seconds?

DTrace user land example 1

• pid (process ID) provider• Allow tracing of and arbitrary process such as a Web browser of

databases• Process does not need to be restarted (important in production)

dtrace -n ‘pid$target:::entry { trace(timestamp); }’ -c ls

dtrace -n ‘pid$target:::entry { trace(timestamp); }’ -p …

process id to trace

DTrace internals - overview

DTrace internals

• DTrace core framework

• Interfaces to userspace through ioctl() to /dev/dtrace/*

• Majority of the code (dtrace.c) is generic framework code

• Delegates instrumentation to providers

• Implements a virtual machine for the D Intermediate Format (DIF)

• Ensures safety of D scripts

• Analogous to Java bytecode

DTrace internals

• Provider

• Holds the knowledge about specific instrumentation techniques

• Enables probes when consumer says so (specific to each provider)

• Transfer control to DTrace framework by calling dtrace_probe(probe_id) somehow

• Then, DTrace framework can performs actions associated with probe

• dtrace_probe()

• Runs in specialized context known as probe context

• Interrupts are disabled for the CPU executing the probe

• Does no memory allocation and takes no locks

DTrace internals

• D compiler:

• Compiles D scripts to D Intermediate Format (DIF)

• Assembles DIF into DIFOs (for predicate and actions)

• Merges DIFOs into DOF (D Object Format)

• DIF has a RISC-like instruction set that supports D language constructs

dtrace -S -n 'syscall:freebsd:open:entry {printf("%s %s\n", execname, copyinstr(arg0)); }'

There’s no spec but we are working on it

DTrace internals

• Enabling Control Blocks (ECBs)

• Kernel data structure that represents an enabled probe

• Each ECB contains DIFO for predicates and actions associated with probe enabling

• Can be chained together if multiple consumers are interested in a given probe

Probe

ECB ECB

PredicateDIFO

ActionDIFO

ActionDIFO

DTrace internals - Function Boundary Tracing (FBT)

• sys_getpid, no DTrace

<sys_getpid> push %rbp<sys_getpid+1> mov %rsp,%rbp<sys_getpid+4> mov 0x8(%rdi),%rax<sys_getpid+8> movslq 0xbc(%rax),%rax

• sys_getpid, while running dtrace -n ‘fbt:kernel:sys_getpid:entry'• int3 replaces first sys_getpid instruction• push instruction is emulated in the kernel• original push instruction is restored after dtrace exits

<sys_getpid> int3<sys_getpid+1> mov %rsp,%rbp<sys_getpid+4> mov 0x8(%rdi),%rax<sys_getpid+8> movslq 0xbc(%rax),%rax

Used by debuggers to set breakpoints - architecture dependent

DTrace uses CTF Compressed Type Format) to identify function entry point

DTrace in the real world

• How many software engineers are there in the world working in systems?

• Boem estimates 700k work in “infrastructure”, which is a fairly broad definition

• How many people have used DTrace? ~1000s.

• How many people have written DTrace scripts? <1000

• How many people now how DTrace works in any detail? ~100

DTrace case study - Customers are not the enemy

• Scenario: Customer system reporting terrible performance

• NFS server

• SAN backend

• ~ 300 Mb/s

• 4 socket, 6 cores per socket

• Symptoms: Latency is “bad”, our customers are angry. Your product sucks

https://www.infoq.com/presentations/dtrace

Customers don’t report details like latency is twice average specified in service level agreement!

https://www.infoq.com/presentations/dtrace

DTrace case study - Performance goals

• Why work isn’t being done?

• Thread not being schedule

• Queue of work isn’t being processed

• Get idle into the system

• Identify work being done inefficiently

• Identify where cycles are wasted doing the wrong thing

Useful DTrace probes of-cpu

DTrace case study - Measuring NFS latency

• Analysis: Develop DTrace scripts for measuring NFS latency

nfsv3:::op-read-start,nfsv3:::op-write-start{ self->ts = timestamp}

nfsv3:::op-read-done,nfsv3:::op-write-done{ @[probename == “op-write-done” ? “write” : “read” = quantize(timestamp - self.ts); self.ts = 0;}

quantize

DTrace case study

Synchronousnfsv3:::op-write-start{ self->sync = args[2]->stable != 0;}

Cache misssdt:::arc-miss,sdt:::blocked-read{ self->io = 1;}

Cached, synchronous and asynchronous read/writes very different performance - different populations

DTrace case study - synchronous write

• Average latency looks pretty good

DTrace case study - synchronous write

Average latency not so useful - outliers dominate

Linear Log

DTrace case study - summary

• Instrumentation with DTrace identified issues with…

• Throttling, inefficiencies in code and a heavily contented lock

• Lessons learnt

• Need to analyse performance on real-world systems

• Right tools, right questions

• Remedies without analysis don’t work and antagonise customers

• Systems shouldn't absorb all the pain

DTrace in the real world

• DTrace Toolkit

• http://www.brendangregg.com/dtracetoolkit.html

• DTrace one liners

• http://www.brendangregg.com/DTrace/dtrace_oneliners.txt

• https://wiki.freebsd.org/DTrace/One-Liners

http://www.brendangregg.com/dtracetoolkit.html

http://www.brendangregg.com/DTrace/dtrace_oneliners.txt

http://www.brendangregg.com/DTrace/dtrace_oneliners.txt

https://wiki.freebsd.org/DTrace/One-Liners

One tool to rule them all…

• In 2017 in production means a distributed system

• Gap opened up once again between the types of in-production analysis we’d like to perform and available tools

• Distributed DTrace - see CADETS project

• Tracing internode flows?

• Volumes of data

• Security?

• What about heterogenous environments?

• Need to instrument Linux, Ilumos, FreeBSD environments seamlessly

Linux: One tool to rule them all… not so much

ftrace and perf

• ftrace

• Tracing utility built directly into the Linux kernel

# cd /sys/kernel/debug/tracing# cat available_tracers

• To enable tracer:

# echo function_graph > current_tracer

perf

• perf - instruments "events", which are a unified interface for different kernel instrumentation frameworks.

• Hardware Events: These instrument low-level processor activity based on CPU performance counters. For example, CPU cycles, instructions retired, memory stall cycles, level 2 cache misses, etc. Some will be listed as Hardware Cache Events.

• Software Events: These are low level events based on kernel counters. For example, CPU migrations, minor faults, major faults, etc.

perf (cont)

• perf - instruments "events", which are a unified interface for different kernel instrumentation frameworks.

• Tracepoint Events: This are kernel-level events based on the ftrace framework. These tracepoints are placed in interesting and logical locations of the kernel, so that higher-level behavior can be easily traced. For example, system calls, TCP events, file system I/O, disk I/O, etc. These are grouped into libraries of tracepoints; eg, "sock:" for socket events, "sched:" for CPU scheduler events.

• Dynamic Tracing: Software can be dynamically instrumented, creating events in any location. For kernel software, this uses the kprobes framework. For user-level software, uprobes.

• Timed Profiling: Snapshots can be collected at an arbitrary frequency, using perf record -FHz. This is commonly used for CPU usage profiling, and works by creating custom timed interrupt events.

SystemTap

• Scripting language and tool for dynamically instrumenting running production Linux kernel-based operating systems

• stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}'

virtual file system read

• “safe to use in production?” “yes”

https://sourceware.org/systemtap/wiki/SystemtapDtraceComparison

https://sourceware.org/systemtap/wiki/SystemtapDtraceComparison

BPF (nee eBPF)

• Berkley Packet Filter (PBF)

• Interface to data link layers to capture raw packets (tcpdump)

• BPF virtual machine executes user scripts to filter packets

• JIT BPF instructions to native code - to improve performance

• Bytecode Probe Framework (maybe?)

• DTrace-like facilities attaching BPF scripts to tracepoints: kprobes, uprobes, tracepoints (soon)

• bcc front-end (and others)

• What can we do with BPF?

• “crazy stuff”

Designed for tracing 10GbE networking - highly performant

Walk stacks…

Bytecode Probe Framework

BPF

kprobes

uprobes

tracepoints

maps

KernelUserland

BPF bytecode

per-event data

statistics

generate load

async read

perf output

per-event output and in-kernel summarisation

Any questions?

Tracing tools quickly outperform log based analysis in production

Vast array of tracing tools in Linux and BSD-based operating systems

DTrace provides unified tracing with guaranteed safety in production

All tracing mechanisms introduce probe effects

Probe effects can change the very behaviour we are investigating

Measure (on the real system), analyse and remediate - don’t guess

DTrace on Linux isn’t production ready but other tools such as SystemTap and eBPF are roughly equivalent

performance analysis and troubleshooting using dtrace

Technology