network heartbeat in linux

24

Upload: others

Post on 10-Jun-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Heartbeat in Linux

1/24

Network Heartbeat in Linux

Francesco Piermaria

[email protected]

Linux Kernel Hacking Free Course IV Edition

Rome, 14 May 2008

Francesco Piermaria Network Heartbeat in Linux

Page 2: Network Heartbeat in Linux

2/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Outline

1 Goal

CAOS

Tick Synchronization

2 Timekeeping Data Structures

Clocksource Device

Clock Event Device

3 Implementation

Network Event Device

Nettick

Timer Interrupt Emulation

4 Test

Environment

Results

5 Conclusions

Francesco Piermaria Network Heartbeat in Linux

Page 3: Network Heartbeat in Linux

3/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Goal

Synchronize all cluster nodes ticks

To implement a base component of Cluster Advanced

Operating System (CAOS)

To improve HPC application performances

Francesco Piermaria Network Heartbeat in Linux

Page 4: Network Heartbeat in Linux

4/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Cluster Advanced Operating System

Cluster Advanced Operating System (CAOS):

Born from a System Programming Research Group's idea

A forthcoming distributed operating system

What's new:

Synchronize ticks and tick's related activities (timekeeping,

process accounting, pro�ling, etc.)

Globally schedule: forcing all nodes in the cluster to perform inthe same moment

Software timers related activities (�ush cache, etc.)System asynchronous activities (page frame reclaiming, timesharing, etc.)

O�er additional features like distributed data check-pointing,

distributed debugging, process migration etc.

Here we focus on the Network Heartbeat solution, which deals with

the �rst pointFrancesco Piermaria Network Heartbeat in Linux

Page 5: Network Heartbeat in Linux

5/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Tick Synchronization

Improve HPC application performance

A possible solution is to force all nodes to perform system activities

in a synchronous way, as shown in �gure

(a) co-scheduled OS noise

It's not easy to achieve global

synchronization of system

activities for the whole cluster, in

such a way to be sure that all

nodes will execute the activities

exactly in the same moment

Francesco Piermaria Network Heartbeat in Linux

Page 6: Network Heartbeat in Linux

6/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Tick Synchronization (2)

Why is not easy to achieve global synchronization of system

activities?

Each node uses his own timer device to measure time

Even timer devices of the same type oscillate at slightly

di�erent frequencies

Tick period is slightly di�erent from node to node

Machines have di�erent perception of the �ow of time

Francesco Piermaria Network Heartbeat in Linux

Page 7: Network Heartbeat in Linux

7/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Tick Synchronization (3)

Goal: To synchronize cluster's nodes system activities

We have to synchronize the �ow of time among nodes

Network Time Protocol (NTP) is NOT a solution! Itsynchronizes only the Wall Clock Time...

We have to synchronize system ticks

We need a general mechanism to schedule overall cluster's

system activities at the same future time

As a CAOS's �rst implementation step, we introduce the Network

Heartbeat, which allows tick synchronization.

Francesco Piermaria Network Heartbeat in Linux

Page 8: Network Heartbeat in Linux

8/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Network Heartbeat Schema

(b) network heartbeat

Francesco Piermaria Network Heartbeat in Linux

Page 9: Network Heartbeat in Linux

9/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Timekeeping Kernel's Related Data Structures

Linux kernel o�ers speci�c abstract data structures to represent

time related hardware devices:

struct clocksource: provides a time value to the kernel

struct clock_event_device: noti�es the kernel that a well

de�ned amount of time is elapsed

struct tick_device: represents the best clock event device

available to the kernel

Francesco Piermaria Network Heartbeat in Linux

Page 10: Network Heartbeat in Linux

10/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Clocksource Device

The Linux kernel keeps track of the best available clocksource in

the clock global variable

s t r u c t c l o c k s o u r c e {char ∗name ;i n t r a t i n g ;cyc l e_t (∗ r ead ) ( vo id ) ;u32 mult , s h i f t ;unsigned long f l a g s ;c yc l e_t c y c l e_ l a s t ;/∗ . . . ∗/

} ;

The clocksource data type is

used to represent hardware like

TSC, PIT, HPET, ACPI_PM,

TIMEBASE, etc.

I.e. the update_wall_time() function updates wall clock time

using the clock variable

o f f s e t = c lock−>read ( ) − c lock−>cy c l e_ l a s t ;x t ime . tv_nsec += o f f s e t ∗ c lock−>mult ;

Francesco Piermaria Network Heartbeat in Linux

Page 11: Network Heartbeat in Linux

11/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Clock Event Device

Clock Event Devices are used to raise hardware timer interrupts at

speci�ed time

s t r u c t c lock_event_dev i ce {const char ∗name ;unsigned long max_delta_ns ,

min_delta_ns ;unsigned long mult , f e a t u r e s ;i n t s h i f t , r a t i n g , i r q ;cpumask_t cpumask ;i n t (∗ set_next_event ) ( unsigned

long ev t ) ;vo id (∗ set_mode ) (enum

clock_event_mode mode ) ;vo id (∗ event_hand l e r ) ( ) ;ktime_t next_event ;/∗ . . . ∗/

} ;

The clock event data type is

used to represent hardware

like LAPIC, HPET,

DECREMENTER, etc.

The function set_next_event(msec) is used to program the time

of the next time event

Francesco Piermaria Network Heartbeat in Linux

Page 12: Network Heartbeat in Linux

12/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Tick Device

The Linux kernel keeps track of the best available clock event

device in the tick_cpu_device per-CPU variable (tick device for

short)

If High Resolution Timers are enabled:

The tick device raises an hardware interrupt at the time of the

next time event (not necessarily a tick time event)

s t r u c t t i c k_de v i c e {s t r u c t c lock_event_dev i ce

∗ ev tdev ;enum t ick_device_mode

mode ;} ;

The mode �eld can be

periodic : periodic tick mode

oneshot: dynamic tick mode

Francesco Piermaria Network Heartbeat in Linux

Page 13: Network Heartbeat in Linux

13/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Network Heartbeat Main Topics

The Network Heartbeat:

Deals with the problem of the overall cluster's nodes

tick synchronization

Patch for the Linux 2.6.24 kernel

Designed for Symmetric-Multi-Processing (SMP) systems

Developed for the Intel IA32, Intel64 and PowerPC64

architectures

Based on the Ethernet communication channel

Permits to dynamically change node's tick frequency

Francesco Piermaria Network Heartbeat in Linux

Page 14: Network Heartbeat in Linux

14/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Network Heartbeat Implementation

The Network Heartbeat Implementation is built on two software

components:

Network Event Device: a virtual per-CPU timer event

device, which replaces local metronomes

Nettick: an interrupt emulation module, which translates the

event �new Ethernet frame� in a timer interrupt event

Francesco Piermaria Network Heartbeat in Linux

Page 15: Network Heartbeat in Linux

15/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Network Heartbeat Implementation (2)

The nettick module registers a new network protocol. It

accomplishes the following:

Recognize a new Ethernet frame type, introduced to notify the

tick event type

Send an Inter Processor Interrupt to all the online CPUs, thus

�simulating� a local timer interrupt

The Network Event Device net_event :

Handles the IPI sent by the nettick module and instructs the

CPU on which is registered to execute tick and time

management related functions

Exports an user interface which allows users to choose the

kernel operating mode (global metronome or local

metronomes)

Francesco Piermaria Network Heartbeat in Linux

Page 16: Network Heartbeat in Linux

16/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Network Event Device Implementation

The virtual Network Event Device implements the set_mode() and

set_next_event() methods

s t a t i c s t r u c t c lock_event_dev i cenet_event = {. name = "net_event " ,. f e a t u r e s = CLOCK_EVT_FEAT_ONESHOT,. set_mode = net_timer_setup ,. set_next_event = net_next_event ,. set_new_rate = set_new_rating ,. event_hand l e r = net_handle_noop ,. r a t i n g = 0 ,. i r q = −1,. nr_events = 0 ,

} ;DEFINE_PER_CPU( s t r u c t c lock_event_dev ice

, net_events ) ;

The (*event_handler)()

will be re-associated to the

right handler either when

device is registered or

when rating changes.

The timekeeping framework's interface was extended to make

possible to change the rating value of a clock event device

Francesco Piermaria Network Heartbeat in Linux

Page 17: Network Heartbeat in Linux

17/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Nettick Implementation

The nettick module registers a new network protocol. As in

example, on the Intel x86 architecture:

s t a t i c s t r u c t packet_typenet t i ck_packe t_type = {/∗ ETH_P_NETTICK 0x88CB ∗/. type = htons (ETH_P_NETTICK) ,. func = ne t t i c k_rcv ,

} ;dev_add_pack(&net t i ck_packet_type ) ;

. . .n e t t i c k_ r c v ( ) {

send_ipi_mask ( cpu_online_mask ,NETTICK_TIMER_VECTOR) ;

}

The nettick protocol

handler sends an Inter

Processor Interrupt to

all online CPUs.

N.B. the smp_call_function() function cannot be used because

the nettick_rcv() handler is executed in softirq context!

Francesco Piermaria Network Heartbeat in Linux

Page 18: Network Heartbeat in Linux

18/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Timer Interrupt Emulation

The new network protocol is required to make the implementation

independent of the particular Network Interface Card's driver

(c) From Ethernet Frame to Timer InterruptEmulation

nettick handler is

executed in soft irq

context

tick related activities

must be executed in

hard irq context

IPI is the only way to

interrupt other CPUs

Francesco Piermaria Network Heartbeat in Linux

Page 19: Network Heartbeat in Linux

19/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Timer Interrupt Emulation (2)

A new IPI message has to be registered by calling the

set_intr_gate().

The IPI network tick message handler cleans the interrupt

channel and then call the following function

l o c a l_n e t t i c k_ t ime r_ i n t e r r u p t ( ) {s t r u c t c lock_event_dev i ce ∗dev =

&__get_cpu_var ( net_events ) ;dev−>nr_events++;dev−>event_hand l e r ( dev ) ;

}

1 Gets the ref. to the

per-CPU network event

device

2 Increments the device's

event counter

3 Enters the timekeeping

system

The chain of events is identical as the chain originated by a

local timer interrupt!

Francesco Piermaria Network Heartbeat in Linux

Page 20: Network Heartbeat in Linux

20/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Test Environment

Tests was performed on a 24-nodes Apple Xserve cluster,

generously o�ered by Italian Defence's General Stu�

Each node is equipped of 2 dual core Intel Xeon 5150

processors (freq. 2.66GHz) overall 96 cores, 4GByte of RAM

and 2 Gigabit Ethernet Network Interface Cards

One of the 24 nodes was employed as master node

(d) Cluster Front View (e) Cluster Back View

Francesco Piermaria Network Heartbeat in Linux

Page 21: Network Heartbeat in Linux

21/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Network Heartbeat Test

Tests goal: overhead and scalability

NAS Parallel Benchmark - Embarrassing Parallel

(f) Benchmark EP Results

Francesco Piermaria Network Heartbeat in Linux

Page 22: Network Heartbeat in Linux

22/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Network Heartbeat Test (2)

Tests goal: overhead and scalability

NAS Parallel Benchmark - Embarrassing Parallel

(g) Benchmark EP E�ciency

Francesco Piermaria Network Heartbeat in Linux

Page 23: Network Heartbeat in Linux

23/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Conclusions and Future Works

The Network Heartbeat allows to synchronize cluster's nodes ticks

by means a global metronome

Tests highlight that proposed approach:

does not introduce overhead

scales adequately when cluster's nodes increase in number

Future works:

Extend nettick protocol to allow master node to schedule

activities in a centralized way

Francesco Piermaria Network Heartbeat in Linux

Page 24: Network Heartbeat in Linux

24/24

Outline Goal Timekeeping Data Structures Implementation Test Conclusions Bibliography

Bibliography

D.P. Bovet, M. Cesati, Understanding the Linux Kernel, 2nd

Edition, O'Reilly

R. Gioiosa and F. Petrini and K. Davis and F.

Lebaillif-Delamare Analysis of System Overhead on Parallel

Computers, The 4th IEEE International Symposium on Signal

Processing and Information Technology (ISSPIT 2004)

D. Tsafrir and Y. Etsion and D.G. Feitelson and S. Kirkpatrick

System noise, OS clock ticks, and �ne-grained parallel

applications, ICS '05: Proceedings of the 19th annual

international conference on Supercomputing, Cambridge,

Massachusetts, 303�312

Francesco Piermaria Network Heartbeat in Linux