extensible distributed tracing from kernels to clusters Úlfar erlingsson, google inc. marcus...
TRANSCRIPT
1
Extensible Distributed Tracing from Kernels to Clusters
Úlfar Erlingsson, Google Inc.Marcus Peinado, Microsoft Research
Simon Peter, Systems Group, ETH ZurichMihai Budiu, Microsoft Research
Fay
2
Wouldn’t it be nice if…
• We could know what our clusters were doing?
• We could ask any question,… easily, using one simple-to-use system.
• We could collect answers extremely efficiently… so cheaply we may even ask
continuously.
3
Let’s imagine...
• Applying data-mining to cluster tracing• Bag of words technique– Compare documents w/o structural knowledge– N-dimensional feature vectors– K-means clustering
• Can apply to clusters, too!
4
Cluster-mining with Fay
• Automatically categorize cluster behavior, based on system call activity
5
Cluster-mining with Fay
• Automatically categorize cluster behavior, based on system call activity – Without measurable overhead on the execution– Without any special Fay data-mining support
6
Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (Norm(pt – c) < Norm(pt – near)) near = c; return near; }
var kernelFunctionFrequencyVectors =
cluster.Function(kernel, “syscalls!*”)
.Where(evt => evt.time < Now.AddMinutes(3))
.Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr }) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });
Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}
Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}
Fay K-Means Behavior-Analysis Code
7
var kernelFunctionFrequencyVectors =
cluster.Function(kernel, “syscalls!*”)
.Where(evt => evt.time < Now.AddMinutes(3))
.Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr }) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });
Fay K-Means Behavior-Analysis Code
8
Fay vs. Specialized Tracing
• Could’ve built a specialized tool for this– Automatic categorization of behavior (Fmeter)
• Fay is general, but can efficiently do– Tracing across abstractions, systems (Magpie)– Predicated and windowed tracing (Streams)– Probabilistic tracing (Chopstix)– Flight recorders, performance counters, …
9
Key Takeaways
Fay: Flexible monitoring of distributed executions– Can be applied to existing, live Windows servers
1. Single query specifies both tracing & analysis– Easy to write & enables automatic optimizations
2. Pervasively data-parallel, scalable processing– Same model within machines & across clusters
3. Inline, safe machine-code at tracepoints– Allows us to do computation right at data source
10
Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (|pt – c| < |pt – near|) near = c; return near; }
var kernelFunctionFrequencyVectors =
cluster.Function(kernel, “*”)
.Where(evt => evt.time < Now.AddMinutes(3))
.Select(evt => new { Machine = MachineID(), Interval = w.Cycles / CPS, Function = w.CallerAddr}) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });
Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}
Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}
K-Means: Single, Unified Fay Queryvar kernelFunctionFrequencyVectors =
cluster.Function(kernel, “*”)
.Where(evt => evt.time < Now.AddMinutes(3))
.Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr}) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });
Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (Norm(pt – c) < Norm(pt – near)) near = c; return near; }
Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}
Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}
11
Fay is Data-Parallel on Cluster
• View trace query as distributed computation• Use cluster for analysis
12
Fay is Data-Parallel on Cluster
System call trace events• Fay does early aggregation & data reduction• Fay knows what’s needed for later analysis
13
Fay is Data-Parallel on Cluster
System call trace events• Fay does early aggregation & data reduction
K-Means analysis• Fay builds an efficient processing plan from query
14
Fay is Data-Parallel within Machines
• Early aggregation• Inline, in OS kernel• Reduce dataflow & kernel/user transitions
• Data-parallel per each core/thread
15
Processing w/o Fay Optimizations
• Collect data first (on disk)• Reduce later• Inefficient, can suffer data overload
K-Means: System calls K-Means: Clustering
16
Traditional Trace Processing
• First log all data (a deluge)• Process later (centrally)• Compose tools via scripting
K-Means: System calls K-Means: Clustering
17
Takeaways so far
Fay: Flexible monitoring of distributed executions
1. Single query specifies both tracing & analysis
2. Pervasively data-parallel, scalable processing
18
Safety of Fay Tracing Probes
• A variant of XFI used for safety [OSDI’06]
– Works well in the kernel or any address space– Can safely use existing stacks, etc.– Instead of language interpreter (DTrace)– Arbitrary, efficient, stateful computation
• Probes can access thread-local/global state• Probes can try to read any address– I/O registers are protected
19
Key Takeaways, Again
Fay: Flexible monitoring of distributed executions
1. Single query specifies both tracing & analysis
2. Pervasively data-parallel, scalable processing
3. Inline, safe machine-code at tracepoints
20
Target
Installing and Executing Fay Tracing
• Fay runtime on each machine• Fay module in each traced address space• Tracepoints at hotpatched function boundary
Tracing Runtime
Fay
User-Space
Kernel
Probe
XFI
Createprobe
Hotpatching
query
ETW
200 cycles
21
Low-level Code Instrumentation
Caller: ... e8ab62ffff call Foo ...
ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6 ccccccFoo2: 57 push rdi
...
c3 ret
Module with a traced function Foo
• Replace 1st opcode of functions
22
Low-level Code Instrumentation
Caller: ... e8ab62ffff call Foo ...
ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6 ccccccFoo2: 57 push rdi
...
c3 ret
Module with a traced function Foo Fay platform module
Dispatcher: t = lookup(return_addr) ...
call t.entry_probes ...
call t.Foo2_trampoline ...
call t.return_probes ... return /* to after call Foo */
• Replace 1st opcode of functions• Fay dispatcher called via trampoline
23
Low-level Code Instrumentation
PF5
PF3
PF4
Caller: ... e8ab62ffff call Foo ...
ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6 ccccccFoo2: 57 push rdi
...
c3 ret
Module with a traced function Foo Fay platform module
Dispatcher: t = lookup(return_addr) ...
call t.entry_probes ...
call t.Foo2_trampoline ...
call t.return_probes ... return /* to after call Foo */
Fay probes
XFI XFI
XFI
• Replace 1st opcode of functions• Fay dispatcher called via trampoline• Fay calls the function, and entry & exit probes
24
• Fay adds 220 to 430 cycles per traced function • Fay adds 180% CPU to trace all kernel functions• Both approx 10x faster than Dtrace, SystemTap
What’s Fay’s Performance & Scalability?
Fay Solaris Dtrace
OS X Dtrace
Stap Linux
0
2000
4000
6000
8000
10000
Fay Solaris Dtrace
OS X Dtrace
Stap Linux
05
1015202530
2.8
17.2
26.7 CrashNull-probe overhead Slowdown (x)
Cycl
es
25
Fay Scalability on a Cluster
• Fay tracing memory allocations, in a loop:– Ran workload on a 128-node, 1024-core cluster– Spread work over 128 to 1,280,000 threads– 100% CPU utilization
• Fay overhead was 1% to 11% (mean 7.8%)
26
More Fay Implementation Details
• Details of query-plan optimizations• Case studies of different tracing strategies• Examples of using Fay for performance analysis
• Fay is based on LINQ and Windows specifics– Could build on Linux using Ftrace, Hadoop, etc.
• Some restrictions apply currently– E.g., skew towards batch processing due to Dryad
27
Conclusion
• Fay: Flexible tracing of distributed executions
• Both expressive and efficient– Unified trace queries– Pervasive data-parallelism– Safe machine-code probe processing
• Often equally efficient as purpose-built tools
28
Backup
29
A Fay Trace Query
from io in cluster.Function("iolib!Read")where io.time < Now.AddMinutes(5)let size = io.Arg(2) // request size in bytesgroup io by size/1024 into gselect new { sizeInKilobytes = g.Key,
countOfReadIOs = g.Count() };
• Aggregates read activity in iolib module• Across cluster, both user-mode & kernel• Over 5 minutes
30
A Fay Trace Query
from io in cluster.Function("iolib!Read")where io.time < Now.AddMinutes(5)let size = io.Arg(2) // request size in bytesgroup io by size/1024 into gselect new { sizeInKilobytes = g.Key,
countOfReadIOs = g.Count() };
• Specifies what to trace• 2nd argument of read function in iolib
• And how to aggregate• Group into kb-size buckets and count 1024 2048 4096 8192
0200040006000