michihiro koibuchi(nii, japan ) tomohiro otsuka(keio u, japan ) hiroki matsutani ( u of tokyo,...

21
Michihiro Koibuchi(NII, Japan Tomohiro Otsuka(Keio U, Japan Hiroki Matsutani U of Tokyo, Japan Hideharu Amano Keio U/ NII, An On/Off Link Activation Method for Low-Power Ethernet in PC Clusters

Upload: flora-murphy

Post on 04-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

Michihiro Koibuchi(NII, Japan )Tomohiro Otsuka(Keio U, Japan )Hiroki Matsutani ( U of Tokyo, Japan )Hideharu Amano ( Keio U/ NII, Japan )

An On/Off Link Activation Method for Low-Power Ethernet

in PC Clusters

Page 2: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

HPC PC Clusters with Ethernet• Host/CPU

– Various low-power techniques are used

• DVFS• Power Gating

• Ethernet Switch– Always preparing

(active) for packet injection

We propose, and evaluate a low-power technique of Ethernet switches for PC clusters

PC Ethernet switch

Interconnects share@TOP500 (Nov 2008 ) Gigabit Ethernet

56%GbE

Page 3: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

• Ethernet for HPC– Link aggregation (channel group) + multi-paths

• On/Off link activation method• Evaluations

– Overhead of On/Off link operation– Performance and power consumption of PC

clusters

Outline

Page 4: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

Ethernet on HPC systemsIncreasing the number of ports of GbE switches

- 24/48-port switches provide the lowest cost per port

Improving the computation power of host ( > 10GFlops)

Link aggregation [IEEE 802.3ad] + multi-path topology [Kudoh, IEEE Cluster, 2004][Viking, Infocom2004]

- drastically increasing the number of links

switch

host

TREE 1 TREE 4TREE 3TREE 2

0 1 2 3 4 5 6 7 8 9 10 11 12131415

Link aggr. using 3 links

4 paths

Page 5: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

• Power cons is almost constant regardless of traffic load• # of activated ports dominates the power cons of switches

– Power cons of port is reduced down to ZERO by port-shutdown operation

Power cons of GbE switchesProduct Port Other

(Xbar) Total ( ratio of ports )

PC5324 1.2 14.9 42.9(65%)

PC6224 2.0 42.5 91.1(53%)

PC6248 2.1 56.8 155.2(63%)

SF-420 1.0 32.6 55.4(41%)

C-3750 1.8 84.5 127.7(34%)

Unit :W

Page 6: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

Overview of the on/off link method  

switch

node

Traffic load becomes low

( turning off a part of links)

TREE 1 TREE 4TREE 3TREE 2

0 1 2 3 4 5 6 7 8 9 10 11 12131415

TREE 1 TREE 4TREE 3TREE 2

0 1 2 3 4 5 6 7 8 9 10 11 12131415

Network load is not always high (e.g. during computation time

Switch ports consume 40-60% of the total power

Page 7: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

• Ethernet for HPC– Link aggregation (channel group) + multi-paths

• On/Off link activation method• Evaluations

– Overhead of On/Off link operation– Performance and power consumption of PC

clusters

Outline

Page 8: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

A framework of on/off link methodEg : port monitor,

IPTraf, pilot execution

How is it implemented on Ethernet?

Low or high-load links appear

Selection of on/off links and paths

Update of on/off link operation

Traffic monitoring

No

Yes

Very crucial factor

Low traffic load is detected

TREE 1 TREE 4TREE 3TREE 2

0 1 2 3 4 5 6 7 8 910 11 12131415

Paths: Before & After the before path is deactivated

Page 9: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

Requirements for the on/off link method  

To achieve a practical on/off link activation method,

No update of the MPI communication library

Using existing functions of commercial switches

Hiding the overhead to activate the link

Stabilizing the MAC address tables during updating paths

- Avoiding broadcast storms, and communication interruption

TREE 1 TREE 4TREE 3TREE 2

0 1 2 3 4 5 6 7 8 9 10 11 12131415

Switch

Host

Before

After

Page 10: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ

Changing the paths for on/off link op

• Using switch-tagged ・ VLAN routing method[Otsuka,ICPP06]

– Specifying the path by attaching the VLAN tag to a frame ( Port VLAN ID: PVID)

– Each host sends and receives usual (untagged) frames• When an frame arrives at a switch from a host, add a VLAN tag (PVID) to it• When it leaves to a host, removes the VLAN tag

The path of PVID#v1The path of PVID#v0

0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ

VLAN v0

VLAN v1

PVID v0 1

VLAN tag #v0 is

attached

Page 11: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

When a deactivated link is activated • (1) Activating the target link

– Using no-shutdown command of switch• (2) Create VLAN v0 for the new path set that includes the

target link, and make its MAC address table• (3) Update the PVIDs of the ports for connecting hosts to v0

0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ

Updating PVID to v0

Before

PVID v0

0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ

Step 3

0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ

Step 1,2Link On,

VLAN v0

When the traffic increases

Page 12: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

When an activated link is deactivated• (1) Create VLAN v1 for the new path set that avoids the target

link, and make its MAC address table

• (2) Update the PVID of the ports for connecting hosts to v1• (3) Deactivating the link

The path of PVID v0

PVID #v0 v1

Before

0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータStep 3

0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ

Deactivating

Decreasing the traffic

0 41 2 3 5 6 7コンピュータコンピュータ コンピュータコンピュータ コンピュータコンピュータコンピュータコンピュータ

Step 1,2

The path of PVID v1

Page 13: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

• Ethernet for HPC– Link aggregation (channel group) + multi-paths

• On/Off link activation method• Evaluations

– Overhead of On/Off link operation• On/off link operation• Overhead to modify the path set

– Performance and power consumption of PC clusters

Outline

Dell 5324, 6224(24 ports), 6248(48 ports), Netgear SF-G0420(24 ports)

We can buy them at $1,000-3,000

Page 14: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

a link is continuously operated: on off on

• When enabling STP, the overhead becomes some dozens ~ 1 min• To hide this overhead, paths should be updated after completing the

on/off operation

Fund. eval : On/Off overhead

On/Off Link Op.

PC5324 4.0 (sec)

PC6224 3.4

PC6248 2.2

SF-420 12.0コンピュータコンピュータ

Page 15: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

• Measure the overhead to change paths using VLANs• Communication is not interrupted!!

– Enabling the runtime on/off link activation

Fund. eval(2) : overhead to update paths

Path update

PC5324 0(sec)

PC6224 0

PC6248 0

SF-420 0

コンピュータ

コンピュータ

Before

After

Update PVID to v1

VLAN v0

VLAN v1

Page 16: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

Performance evaluation on a PC cluster

• PC Cluster – 128 hosts, Dual Opteron 1.8GHz x2– MPICH 1.2.7p1

• GbE switch– Dell Power Connect6248

• 28host per switch• 48port@8

• Application– NPB 3.2

Page 17: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

Topology of the cluster• Peak: 4×2 torus, 6 links between switches

– Enabling the link aggregation (IEEE 803.ad)

• Pre-executing the applications for estimating traffic amount– Set up the on/off link set before executing

• Two on/off link selection algorithms – Conservative: maintain the maximum amount of traffic on a link– Aggressive: further power reduction ( details are the proceeding )

Torus

Page 18: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

Results of NPB(64 procs, PC6248 SW )

Fig 1 : Performance Fig 2 : Power Cons of NWs, PC6248s

26% of NW power cons is reduced w/o performance degradation

0

0.2

0.4

0.6

0.8

1

1.2

EP IS LU SP

Rel

ativ

e M

op/s

peak(all links) conservative aggressive

35offlink 14

24

10 40 11 4053

0.6

0.7

0.8

0.9

1

1.1

EP IS LU SP

Rela

tive P

ow

er

Cons(

W)

peak(all links)conservative aggressive

The conservative policy maintained almost the peak performance

26% of power reduction

Page 19: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

Results of NPB(64 procs, other SWs )

A small number of services in L2 switch ( PC5324) is always running compared with that of L3 switch ( PC6248)

0.6

0.7

0.8

0.9

1

1.1

EP IS LU SP

Rela

tive P

ow

er

Cons(

W)

peak(all links)conservative aggressive

0.6

0.7

0.8

0.9

1

1.1

EP IS LU SP

Rela

tive P

ow

er

Cons(

W)

peak(all links) conservative aggressive

Fig 3 : Power Cons, SF-420s

Fig 4 : Power Cons, PC5324

37% of power reduction

The L2 switches reduces the larger ratio of power cons

Page 20: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

• On/Off interconnection networks– Cannot be directly applied to Ethernet– M.Alonso[IPDPS05],V.Soteriou[TPDS07]

– Our on/off link method enables to support some of them in Ethernet

• DVFS for interconnection networks– L.Shang[HPCA03], J.M.Stine[CAL04]– Using multi-speed Ethernet (10M/100M/GbE/10GE) is

similar to the approach for DVFS• Dell switch:PC6248, 10M: 1.1W 100M: 1.3W GbE: 2.1W

Related Work

Page 21: Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off

• We propose the on/off link method on Ethernet– Using port-shutdown command for reducing

power cons• Switch ports consume up to 60% of power cons

in GbE switch– Stabilizing the update of the MAC address table

• Evaluations on the PC cluster with GbE switches– No overhead to update paths– Reducing down to up to 37% of NW power cons

• We will provide the total solution of Ethernet for Low-Power PC clusters

Link aggre. + multi-path topology + on/off links

Conclusions