bandwidth-aware scheduling for clustered multi-core...

24
2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware Scheduling for Clustered Multi-Core Systems Panayiotis Petrides 2 , Frederico Pratas 1 and Pedro Trancoso 2 Leonel Sousa 1

Upload: others

Post on 01-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

2 CASPER GROUP

UCY, Cyprus

1 SiPS GROUP

IST, Portugal

Bandwidth-Aware Scheduling for Clustered Multi-Core Systems

Panayiotis Petrides2, Frederico Pratas1 and Pedro Trancoso2

Leonel Sousa1

Page 2: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

Motivation

Page 3: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

Outline

• Target Architectures

• Overall Scheduling/Mapping Method

• Bandwidth and Execution Time Profiling

• Bandwidth-Aware Scheduler (BAS)

• Experimental Results

• Conclusions and Future Work

Page 4: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

Target Architectures: Clustered Multi-Core

• Pros:

• Integration - reduces memory

latency

• Multiple controllers – larger

memory bandwidth

• Cons:

• Die area

• Difficult memory access

management

Bandwidth Aware Scheduler

Balance memory requests among different controllers

• Existing architectures:

• Intel Single-chip Cloud

Computer

• Intel Nehalem

(approximately the same

problem).

Page 5: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

Target Architectures: hardware and tools

• Binary instrumentation

– PIN + cache simulation

extension

– Single core execution

• Bandwidth

– The bandwidth calculated

independently for each

application using:

• Type of memory accesses

• Number of memory

accesses

• Average execution time

Characteristics SCC-like architecture

Core type Intel [email protected]

L1 Cache 4-way

32KB Data

32KB Instructions

L2 Cache 8-way 2MB Unified

Cache policies Write-back, write-allocate,

No support for coherency

Cluster

configurations

32 (8x4)

64 (16x4)

128 (32x4)

# Controllers 4

Main memory DDR3-800 (6.4 GB/s)

• Architectural model

Page 6: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

Overall Scheduling Method

Static

• Memory bandwidth profiling of representative applications from different areas.

• Classification according to the bandwidth requirements

Dynamic

• Dynamically bandwidth sensing

• Use static information to classify applications for run-time scheduling

• Rebalance memory accesses according to the classification

• Distribute (schedule) applications by the multicore-

clusters to overcome “memory wall”

– Main assumptions: all cores are busy

Page 7: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

Experimental Setup: Set of Applications

Name Description Tests

TPC-H Decision Support benchmark 16 queries

DEX Graph-based database query

application

8 queries

MrBayes Bioinformatics application performing

Bayesian inference of phylogeny

17 DNA data sets

Biobench Benchmark suite containing different

bioinformatics algorithms

phylip_protdist,

phylip_protpars, fasta_dna,

fasta_protein, and hmmer

NAMD Computer chemistry application for

molecular dynamics simulation

single precision, double

precision

PARSEC Benchmark of representative

applications from different areas

Blackscholes,

streamcluster, freqmine

Total 51 different workloads

Page 8: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed Bandwidth and Execution Time Profiling:

Classification

• Dimensions considered for characterizing application

– Execution time

– Bandwidth requirements

• Classification used for each application:

Bandwidth

Low (<0.5*AV)

Medium (≥0.5AV and <1.5*AV)

High (≥1.5*AV)

Exec

uti

on

tim

e

Short (<10s)

Short-Low Short-Medium Short-High

Medium (≥10s and <100s)

Medium-Low Medium-Medium Medium-High

Long (≥100s)

Long-Low Long-Medium Long-High

• Bandwidth calculation:

• Chopped the execution

in several phases or

quantum.

• Calculate the bandwidth

for each phase.

• Calculate the average

of all the phases in the

application.

• Phase - smallest period

of time considered

between two scheduling

actions.

Page 9: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed Bandwidth and Execution Time Profiling:

Classification (cont.)

• Selection of one representative

application per class:

– Calculate the center of each class

– Select the application that is nearest

to the center

• Nine representative

applications were

selected.

Short - Low Short - Medium Short - High

tpch Q3, tpch Q6, tpch Q7, tpch Q8

tpch Q12, tpch Q13, tpch Q16

dex 3 Q4, dex 3 Q8, dex 3 All

dex 4 Q4, dex 4 Q8

dex 5 Q8, dex 5 All

namd single

tpch Q10, tpch Q14 tpch Q15 tpch Q11

Medium - Low Medium - Medium Medium - High

dex 4 All, dex 5 Q4

namd double

freqmine

tpch Q1, tpch Q2 tpch Q9 , streamcluster,

mrbayes 10x5000, 10x20000,

20x5000, 50x1000, 50x1000,

50x5000, 100x1000

Long - Low Long - Medium Long - High

phylip protdist, fasta dna, fasta protein hmmer, phylip protpars mrbayes 10x50 000, 20x20000,

20x50000, 50x20000, 100x5000,

100x20000, 100x50000

Page 10: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

Scheduler: Policies Evaluated

• Random Static Scheduler

Agnostic policy representing a common scenario where

applications are mapped to cores according to their resource

availability

• Oracle Scheduler

A policy which takes into account a priori the overall application

bandwidth characteristics to define the best static placement

of the different applications through the chip

• Proposed Bandwidth Aware Scheduler (BAS)

Proposed policy which takes into account the different

demands of applications at run-time level in order to satisfy

their demands and utilize the systems’ bandwidth through

their execution

Page 11: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

BAS: Distributing Applications to Cores

• Different distribution scenarios – Variation of the number of cores per cluster

– Variation of the distribution of applications:

» Per cluster

» Overall

• Only considered:

– Distribution of applications within the same time category.

– Reduce the exploration space to some representative distributions

Page 12: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

BAS: Considered Inner-cluster Distributions

• Considered distributions inside one cluster:

Ba

nd

wid

th

Low 100% 50% 50% 50% 33% 25% 0% 25% 0% 0%

Medium 0% 50% 25% 0% 33% 50% 100% 25% 50% 0%

High 0% 0% 25% 50% 33% 25% 0% 50% 50% 100%

Low Medium High

Bandwidth Class

00:00:100100:00:00 50:50:00 25:25:50

Page 13: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

BAS: Inner-cluster Distributions (cont.)

• Increasing number of cores per

cluster linearly increases the number

of extra phases

• Scalability of multi-core processors

highly dependent of the off-chip

memory bandwidth

Short applications Medium applications

Long applications

Page 14: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

BAS: Overall Distribution

• Overall distributions considered:

Ban

dw

idth

Low 50% 50% 33% 0%

Medium 50% 0% 33% 50%

High 0% 50% 33% 50%

Note: Distributions with 100% were not considered because there are no scheduling

opportunities. Distributions with 25% were removed for sake of complexity.

• For the random policy all combinations of inner-core

distributions were considered, for example:

100:

00:0

0

00:1

00:0

0

00:0

0:10

0

33:3

3:33

Overall = 33:33:33

50:5

0:00

50:5

0:00

00:0

0:10

0

33:3

3:33

Overall = 33:33:33

25:2

5:50

25:5

0:25

50:2

5:25

33:3

3:33

Overall = 33:33:33

Page 15: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

BAS algorithm

Application

Execution

Apps.

Bandwidth

Distribution

per cluster

Calculate

Bandwidth

Utilization

MAX(UBW)>1

Adaptive

Procedure

T

F

Page 16: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

BAS algorithm (cont.)

Calculate

UBWai, UBWbi

and new

BWBal for new

distributions

T

F

UBWa = MAX(UBWi) UBWb = MIN(UBWi)

BWBal(UBWa,UBWb)

Get valid

new

distributions

Compatibility

Distribution

Lookup Table

More ai-bi

valid

distributions

?

Perform

applications

exchanges

from cluster a

to b

Adaptive Procedure

Low complexity: O(n) – n size of the compatible lookup table

Page 17: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

50:50:00 50:00:50

00:50:50 33:33:33

BAS: Experimental Results

Short applications

Proposed Scheduler Shows better Results Only one case with worst results

Page 18: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

50:50:00 50:00:50

00:50:50 33:33:33

BAS: Experimental Results

Medium applications

Only for two cases same performance with Random Policy Proposed Scheduler Shows better Results

Page 19: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

50:50:00 50:00:50

00:50:50 33:33:33

BAS: Experimental Results

Long applications

Proposed Scheduler Shows better Results for all cases

Page 20: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed Applications Execution Speedup Using

the Bandwidth-Aware Scheduler

Average Speedups

• Short applications: 1.36x

• Medium applications: 1.48x

• Long applications: 1.46x

Short applications

Long applications

Medium applications

Page 21: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

Results Analysis

• The majority of the applications distributions benefit from the proposed bandwidth-aware scheduler

• Very close performance of the proposed bandwidth-aware scheduler to the Oracle policy

• Stable performance of the proposed scheduler

• Multi-cores scalability can benefit from the use of the proposed bandwidth-aware scheduler

Page 22: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

Conclusions

• We have shown:

–The importance of having a bandwidth aware scheduling policy in clustered multi-core architectures

–There are benefits even for short applications

–Scaling multi-cores is highly correlated with the available bandwidth

• We have proposed:

–A quite simple dynamic bandwidth-aware scheduler

–A set of representative applications with different bandwidth and time requirements

–Scaling multi-cores with the use of the proposed bandwidth-aware scheduler

Page 23: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

Future work

• We are performing experimental work on the SCC (Intel donated to us an SCC system)

• Investigation of different but still simple scheduling algorithms, also with more accurate cost functions

• Can we integrate this work in automatic tools at the compiler and OS levels? – Rephrasing the question: can we expect to have automatic

scheduling in these type of systems to overcome memory bandwidth limitation?

Page 24: Bandwidth-Aware Scheduling for Clustered Multi-Core Systemsperso.ens-lyon.fr/evelyne.blesle/aussois/SLIDES/Sousa...2 CASPER GROUP UCY, Cyprus 1 SiPS GROUP IST, Portugal Bandwidth-Aware

technology from seed

Thank You!

http://www.sips.inesc-id.pt

http://www.cs.ucy.ac.cy/carch/casper

Questions?