department of electrical and computer engineering university of massachusetts, amherst xin huang and...

24
epartment of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology for Evaluating Runtime Support in Network Processors

Upload: cameron-byrd

Post on 01-Jan-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

Department of Electrical and Computer Engineering

University of Massachusetts, Amherst Xin Huang and Tilman Wolf

{xhuang,wolf}@ecs.umass.edu

A Methodology for Evaluating Runtime Support in Network

Processors

Page 2: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

2Department of Electrical and Computer Engineering

Runtime Support in Network Processor

Network processor (NP)• Multi-core system-on-chip• Programmability & high packet processing rate

Heterogeneous resources• Control processors• Multiple packet processors• Co-processors• Memory hierarchy• Interconnection

Runtime support• Dynamic task allocation

Receiveand

Transmit

Scratchpad

Hash Unit

μEμEμEμE

μEμEμEμE

SRAMand

DRAMInterface

XscaleControl

Processor

μEμEμEμE

μEμEμEμE

IXP 2800

Page 3: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

3Department of Electrical and Computer Engineering

Receiveand

Transmit

Scratchpad

Hash Unit

μEμEμEμE

μEμEμEμE

SRAMand

DRAMInterface

XscaleControl

Processor

μEμEμEμE

μEμEμEμE

NP Hardware Resources

SRAM

Flash

Memory Mapped I/O

SDRAM

Workload

Task Allocation on the Processors

Runtime Mapping

General Operation of Runtime Support in NP

Input• Hardware resources• Workload

Mapping method Output

• Task allocation

Dynamic adaptation• Different runtime

support systems• Difficult to compare

AP2

AP1

AP3AP2 AP3AP3

Page 4: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

4Department of Electrical and Computer Engineering

Contributions

Evaluation methodology• Traffic representation• Analytical system model based on queuing networks• Results

Specific: 3 example runtime support systemI. Ideal AllocationII. Full Processor Allocation

• R. Kokku, T. Riche, A. Kunze, J. Mudigonda, J. Jason, and H. Vin. A case for run-time adaptation in packet processing systems. In Proc. of the 2nd workshop on Hot Topics in Networks (HOTNETS-II), Cambridge, MA, Nov. 2003

III.Partitioned Application Allocation• T. Wolf, N. Weng, and C.-H. Tai. Design consideration for network

processor operating systems. In Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication System (ANCS), pages 71-80, Princeton, NJ, Oct. 2005

Page 5: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

5Department of Electrical and Computer Engineering

Outline

Introduction Evaluation Methodology

• Dynamic Workload Model• Runtime System Model

Result Summary

Page 6: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

6Department of Electrical and Computer Engineering

Workload

NP workload is characterized by applications and traffic

How to represent workload?

Page 7: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

7Department of Electrical and Computer Engineering

Dynamic Workload Model

Workload graph:• Application/Task: T• Traffic: • Processing requirement:

Example:

Processing requirement:• R. Ramaswamy and T. Wolf. PacketBench: A tool for workload

characterization of network processing. In Proc. of IEEE 6th Annual Workshop on Workload Characterization (WWC-6), page 42-50, Austin, TX, Oct. 2003

( , )W T U

,t tU R( )iD t

Page 8: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

8Department of Electrical and Computer Engineering

Outline

Introduction Evaluation Methodology

• Dynamic Workload Model• Runtime System Model

Result Summary

Page 9: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

9Department of Electrical and Computer Engineering

Runtime System Model

Unified approach for all runtime systems• Queuing networks• Specific solution for each runtime system

• Runtime mapping: • Graph:• Packet arrival rate:• Service time:

Metrics for all runtime systems• Processor utilization:• Average number of packets in the system:

( , )i jD t p,ti j

:t tM T P( , )S P Q

K

Page 10: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

10Department of Electrical and Computer Engineering

Three Example Runtime Support Systems

System I: Ideal Allocation System II: Full Processor Allocation System III: Partitioned Application Allocation

Workload

T1 T2T2

T1 & T2T1 & T2

T1 & T2T1 & T2

T1

T2 T2

T1_1

T2_1T2_1T2_1

T1_2T2_2T2_2

T1_4T2_4T2_4

T1_3T2_3T2_3

Ideal Allocation Full Processor Allocation Partitioned Application Allocation

Page 11: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

11Department of Electrical and Computer Engineering

Example Evaluation Model – System I

Ideal Allocation • All processors can process all packets completely• Unrealistic, but can provide baseline

M/G/m FCFS single station

Page 12: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

12Department of Electrical and Computer Engineering

M/G/m Single Station Queuing System

Cosmetatos approximation

Evaluation metrics

2 2/ / / / / /

11

/ /

0

1/ / / /

(1 ) ,

( ) ( ) ( ) 1; ; [ ] ,

(1 ) !(1 ) ! ! (1 )

1 1 4 5 2; (1 (1 )( 1) )

2 16

M G m M M m M D mB B

m k mmm

M M m mk

M D m M M m DmDm

W c W c W

where

P m m mW P

m m m k m

and

mW W nc m

nc m

K W m

G. Cosmetatos. Some Approximate Equilibrium Results for the Multiserver Queue (M/G/r). Operations Research Quarterly, USA, pages 615 – 620, 1976

G. Bolch, S. Greiner, H. de Meer, and K. S. Trivedi. Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. John Wiley & Sons, Inc., New York, NY, August 1998

;m

Page 13: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

13Department of Electrical and Computer Engineering

Example Evaluation Model – System II

Full Processor Allocation• Allocate entire tasks to subsets of processors• Allocate as few processors as possible to save power• One processor run one type of task• Reallocation is triggered by queue length

BCMP M/M/1-FCFS model

(Jackson network)

Page 14: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

14Department of Electrical and Computer Engineering

BCMP Network

BCMP: Basket, Chandy, Muntz, and Palacios Characteristics: Open, closed, and mixed queuing network;

Several job classes; Four types of nodes: M/M/m–FCFS (class-independent service time), M/G/1–PS, M/G/∞–IS, and M/G/1–LCFS PR

Product-form steady-state solution: Open M/M/1-FCFS BCMP Queuing Network:

• Evaluation metrics:

11

1( ,..., ) ( ) ( ),

( )

N

N i ii

s s d s n sG K

11

( ,..., ) ( ), ( ) (1 ) i

Nk

N i i i i i ii

k k k k

F. Baskett, K. Chandy, R. Muntz, and F. Palacios. Open, Closed, and Mixed Networks of Queues wit Different Classes of Customers. Journal of the ACM, 22(2): 248 – 260, April 1975

,1 1 1

,1

C C Cir ir

i iri ir ir rr r r i i

eK K

Page 15: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

15Department of Electrical and Computer Engineering

Example Evaluation Model – System III

Partitioned Application Allocation• Tasks be partitioned across multiple processors• Synchronized pipelines• Allocate tasks equally across all processors to maximize

throughput• Reallocate at fixed time intervals

Equations for evaluation metrics are the same as System II.

BCMP M/M/1-FCFS model(Jackson network)

Page 16: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

16Department of Electrical and Computer Engineering

Outline

Introduction Evaluation Methodology

• Dynamic Workload Model• Runtime System Model

Result Summary

Page 17: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

17Department of Electrical and Computer Engineering

Setup

System• 16 100MIPS processing engines • Queue lengths are infinite

Workload

Other assumptions• Partition applications into 7-15 subtasks

Page 18: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

18Department of Electrical and Computer Engineering

Processor Allocation Over Time

Ideal:• 16 processors

Full Processor:• Change with traffic

Partitioned Application:• 16 processors

Full processor allocation system

Page 19: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

19Department of Electrical and Computer Engineering

Processor Utilization Over Time

Ideal:• Lowest processor

utilization Full Processor:

• Highest processor utilization because using fewer number of processors

Partitioned Application:• Low processor utilization• Not equal to ideal case

due to the unbalanced task allocation and pipeline overhead

Page 20: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

20Department of Electrical and Computer Engineering

Packets in System Over Time

Ideal:• Least number of packets

Full Processor:• Packets queued up due to

its high processor utilization

Partitioned Application:• Most number of packets

due to unbalanced task allocation and pipeline overhead

• More stable performance because of finer processor allocation granularity

Page 21: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

21Department of Electrical and Computer Engineering

Performance for Different Data Rates

Ideal:• Smooth increase

Full Processor: • Periodical peak

Partitioned Application:• Smooth increase

The maximum data rate supported by the systems• Ideal: 100%• Full Processor: 79.6%• Partitioned application:

75.1%

Page 22: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

22Department of Electrical and Computer Engineering

Implication of the Results

Ideal Allocation• Provide a base line

Full Processor Allocation• Allocate as few processors as possible to save power• Use entire processor as the allocation granularity• Good: High processor utilization• Bad: High performance variance

Partitioned Application Allocation• Equally distribute tasks on all the processors• Finer processor allocation granularity• Good: Stable performance• Bad: Difficult to get optimized solution => pipeline

synchronization overhead

Page 23: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

23Department of Electrical and Computer Engineering

Summary

Analytical methodology for evaluating different runtime support NP systems

Dynamic workload model and runtime system model

Results: 3 example runtime support systems• Quantitative metrics• Tradeoffs

Page 24: Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

24Department of Electrical and Computer Engineering

Questions ?