extensible distributed tracing from kernels to clusters Úlfar erlingsson, google inc. marcus...

30
Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH Zurich Mihai Budiu, Microsoft Research 1 Fay

Upload: sabrina-snow

Post on 23-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

1

Extensible Distributed Tracing from Kernels to Clusters

Úlfar Erlingsson, Google Inc.Marcus Peinado, Microsoft Research

Simon Peter, Systems Group, ETH ZurichMihai Budiu, Microsoft Research

Fay

Page 2: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

2

Wouldn’t it be nice if…

• We could know what our clusters were doing?

• We could ask any question,… easily, using one simple-to-use system.

• We could collect answers extremely efficiently… so cheaply we may even ask

continuously.

Page 3: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

3

Let’s imagine...

• Applying data-mining to cluster tracing• Bag of words technique– Compare documents w/o structural knowledge– N-dimensional feature vectors– K-means clustering

• Can apply to clusters, too!

Page 4: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

4

Cluster-mining with Fay

• Automatically categorize cluster behavior, based on system call activity

Page 5: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

5

Cluster-mining with Fay

• Automatically categorize cluster behavior, based on system call activity – Without measurable overhead on the execution– Without any special Fay data-mining support

Page 6: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

6

Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (Norm(pt – c) < Norm(pt – near)) near = c; return near; }

var kernelFunctionFrequencyVectors =

cluster.Function(kernel, “syscalls!*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr }) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });

Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}

Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}

Fay K-Means Behavior-Analysis Code

Page 7: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

7

var kernelFunctionFrequencyVectors =

cluster.Function(kernel, “syscalls!*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr }) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });

Fay K-Means Behavior-Analysis Code

Page 8: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

8

Fay vs. Specialized Tracing

• Could’ve built a specialized tool for this– Automatic categorization of behavior (Fmeter)

• Fay is general, but can efficiently do– Tracing across abstractions, systems (Magpie)– Predicated and windowed tracing (Streams)– Probabilistic tracing (Chopstix)– Flight recorders, performance counters, …

Page 9: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

9

Key Takeaways

Fay: Flexible monitoring of distributed executions– Can be applied to existing, live Windows servers

1. Single query specifies both tracing & analysis– Easy to write & enables automatic optimizations

2. Pervasively data-parallel, scalable processing– Same model within machines & across clusters

3. Inline, safe machine-code at tracepoints– Allows us to do computation right at data source

Page 10: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

10

Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (|pt – c| < |pt – near|) near = c; return near; }

var kernelFunctionFrequencyVectors =

cluster.Function(kernel, “*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = MachineID(), Interval = w.Cycles / CPS, Function = w.CallerAddr}) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });

Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}

Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}

K-Means: Single, Unified Fay Queryvar kernelFunctionFrequencyVectors =

cluster.Function(kernel, “*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr}) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });

Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (Norm(pt – c) < Norm(pt – near)) near = c; return near; }

Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}

Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}

Page 11: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

11

Fay is Data-Parallel on Cluster

• View trace query as distributed computation• Use cluster for analysis

Page 12: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

12

Fay is Data-Parallel on Cluster

System call trace events• Fay does early aggregation & data reduction• Fay knows what’s needed for later analysis

Page 13: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

13

Fay is Data-Parallel on Cluster

System call trace events• Fay does early aggregation & data reduction

K-Means analysis• Fay builds an efficient processing plan from query

Page 14: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

14

Fay is Data-Parallel within Machines

• Early aggregation• Inline, in OS kernel• Reduce dataflow & kernel/user transitions

• Data-parallel per each core/thread

Page 15: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

15

Processing w/o Fay Optimizations

• Collect data first (on disk)• Reduce later• Inefficient, can suffer data overload

K-Means: System calls K-Means: Clustering

Page 16: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

16

Traditional Trace Processing

• First log all data (a deluge)• Process later (centrally)• Compose tools via scripting

K-Means: System calls K-Means: Clustering

Page 17: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

17

Takeaways so far

Fay: Flexible monitoring of distributed executions

1. Single query specifies both tracing & analysis

2. Pervasively data-parallel, scalable processing

Page 18: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

18

Safety of Fay Tracing Probes

• A variant of XFI used for safety [OSDI’06]

– Works well in the kernel or any address space– Can safely use existing stacks, etc.– Instead of language interpreter (DTrace)– Arbitrary, efficient, stateful computation

• Probes can access thread-local/global state• Probes can try to read any address– I/O registers are protected

Page 19: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

19

Key Takeaways, Again

Fay: Flexible monitoring of distributed executions

1. Single query specifies both tracing & analysis

2. Pervasively data-parallel, scalable processing

3. Inline, safe machine-code at tracepoints

Page 20: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

20

Target

Installing and Executing Fay Tracing

• Fay runtime on each machine• Fay module in each traced address space• Tracepoints at hotpatched function boundary

Tracing Runtime

Fay

User-Space

Kernel

Probe

XFI

Createprobe

Hotpatching

query

ETW

200 cycles

Page 21: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

21

Low-level Code Instrumentation

Caller: ... e8ab62ffff call Foo ...

ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6 ccccccFoo2: 57 push rdi

...

c3 ret

Module with a traced function Foo

• Replace 1st opcode of functions

Page 22: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

22

Low-level Code Instrumentation

Caller: ... e8ab62ffff call Foo ...

ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6 ccccccFoo2: 57 push rdi

...

c3 ret

Module with a traced function Foo Fay platform module

Dispatcher: t = lookup(return_addr) ...

call t.entry_probes ...

call t.Foo2_trampoline ...

call t.return_probes ... return /* to after call Foo */

• Replace 1st opcode of functions• Fay dispatcher called via trampoline

Page 23: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

23

Low-level Code Instrumentation

PF5

PF3

PF4

Caller: ... e8ab62ffff call Foo ...

ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6 ccccccFoo2: 57 push rdi

...

c3 ret

Module with a traced function Foo Fay platform module

Dispatcher: t = lookup(return_addr) ...

call t.entry_probes ...

call t.Foo2_trampoline ...

call t.return_probes ... return /* to after call Foo */

Fay probes

XFI XFI

XFI

• Replace 1st opcode of functions• Fay dispatcher called via trampoline• Fay calls the function, and entry & exit probes

Page 24: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

24

• Fay adds 220 to 430 cycles per traced function • Fay adds 180% CPU to trace all kernel functions• Both approx 10x faster than Dtrace, SystemTap

What’s Fay’s Performance & Scalability?

Fay Solaris Dtrace

OS X Dtrace

Stap Linux

0

2000

4000

6000

8000

10000

Fay Solaris Dtrace

OS X Dtrace

Stap Linux

05

1015202530

2.8

17.2

26.7 CrashNull-probe overhead Slowdown (x)

Cycl

es

Page 25: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

25

Fay Scalability on a Cluster

• Fay tracing memory allocations, in a loop:– Ran workload on a 128-node, 1024-core cluster– Spread work over 128 to 1,280,000 threads– 100% CPU utilization

• Fay overhead was 1% to 11% (mean 7.8%)

Page 26: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

26

More Fay Implementation Details

• Details of query-plan optimizations• Case studies of different tracing strategies• Examples of using Fay for performance analysis

• Fay is based on LINQ and Windows specifics– Could build on Linux using Ftrace, Hadoop, etc.

• Some restrictions apply currently– E.g., skew towards batch processing due to Dryad

Page 27: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

27

Conclusion

• Fay: Flexible tracing of distributed executions

• Both expressive and efficient– Unified trace queries– Pervasive data-parallelism– Safe machine-code probe processing

• Often equally efficient as purpose-built tools

Page 28: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

28

Backup

Page 29: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

29

A Fay Trace Query

from io in cluster.Function("iolib!Read")where io.time < Now.AddMinutes(5)let size = io.Arg(2) // request size in bytesgroup io by size/1024 into gselect new { sizeInKilobytes = g.Key,

countOfReadIOs = g.Count() };

• Aggregates read activity in iolib module• Across cluster, both user-mode & kernel• Over 5 minutes

Page 30: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

30

A Fay Trace Query

from io in cluster.Function("iolib!Read")where io.time < Now.AddMinutes(5)let size = io.Arg(2) // request size in bytesgroup io by size/1024 into gselect new { sizeInKilobytes = g.Key,

countOfReadIOs = g.Count() };

• Specifies what to trace• 2nd argument of read function in iolib

• And how to aggregate• Group into kb-size buckets and count 1024 2048 4096 8192

0200040006000