a measurement based memory performance evaluation of streaming media servers garba isa yau and abdul...

13
A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd University of Petroleum & Minerals Dhahran Saudi Arabia 10th Annual IEEE Technical Exchange Meeting Presented at the March 23-24, 2003

Upload: kristian-chandler

Post on 19-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

A Measurement Based Memory Performance Evaluation of Streaming

Media Servers

Garba Isa Yau and Abdul Waheed

Department of Computer Engineering

King Fahd University of Petroleum & Minerals

Dhahran Saudi Arabia

10th Annual IEEE Technical Exchange Meeting

Presented at the

March 23-24, 2003

Page 2: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

Outline

• Introduction

• Motivation

• Experiments

• Results and Discussion

• Conclusions and Future Research

• Operating system Impact on performance

Page 3: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

IntroductionBasic architecture

Disk MemoryNetworkinterface

Control

Streaming mediaserver

NetworkMedia client

• Unlike ordinary file downloads or Web applications, streaming media have:

stringent timing requirement high bandwidth requirement CPU intensive high memory requirement

Page 4: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

Motivation

• CPU – Memory speed gap CPU speed doubles in about 18 months (Moore’s Law) Memory access time improves by only one-third in 10 years

• Hierarchical memory architecture introduced to alleviate CPU–memory speed gap

It works on locality of reference of data temporal locality spatial locality

• Streaming media content is a continuous data working set is normally large, cannot fit into cache it has very poor temporal locality (data reuse is poor)

• Hierarchical memory architecture becomes ineffective

Page 5: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

ExperimentsTestbed

Metrics: cache misses (L1 & L2) page fault rate throughput server CPU utilization

Factors: number of streams media encoding rate (56kbps and 300kbps) stream distribution (unique or multiple)

A B C D E F G HSELECTED

ON-LINE

Dual boot server(Windows 2000/Linux Server)

Dual boot client machines(Windows 2000/Linux Server)

Page 6: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

Experiments cont.

• Servers: Apple Darwin streaming server Microsoft Windows media server

• Clients: DSS- Streaming Load Simulator WMS - Media load simulator

• Tools: Intel Vtune performance analyzerWindows performance monitor netstat, vmstat, sar etc.

Page 7: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

Results and Discussion

1

101

201

301

401

501

601

701

801

1 10 100 200 300 400 500 600 700 1000

number of streams (clients)

nu

mb

er o

f ca

che

mis

ses

(mil

lio

ns)

dss, unique

dss, multiple

wms, unique

wms, multiple

1

101

201

301

401

501

601

701

801

1 10 100 200 300 400 500 600 700 1000

number of streams (clients)

nu

mb

er o

f ca

che

mis

ses

(mil

lio

ns)

dss, unique

dss, multiple

wms, unique

wms, multiple

• L1 Cache Performance

L1 cache misses (56kpbs) L1 cache misses (300kbps)

• L1 cache misses are mostly influenced by number of streams• Worst-case performance when the number of streams is high, 300kbps encoding rate and multiple media contents are requested by clients

Page 8: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

• L2 Cache Performance

Results and Discussion cont.

0

500

1000

1500

2000

2500

1 10 100 200 300 400 500 600 700 1000

number of streams (clients)

nu

mb

er o

f ca

che

mis

ses

(mil

lio

ns)

dss, unique

dss, multiple

wms, unique

wms, multiple

L2 cache misses (300kbps)

• Comparison

For both L1 and L2 caches, windows media server has bettercache performance compared to Darwin streaming server

Page 9: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

• Memory Performance

Results and Discussion cont.

0

100

200

300

400

1 10 100 200 300 400 500 600 700 1000

number of streams (clients)

pag

e fa

ult

s /

sec

dss, unique

dss, multiple

wms, unique

wms, multiple

Page fault rate (300kbps)

• Requests for unique media object does not incur much page faults since object can easily be served from memory

• Requests for multiple objects leads to high page fault rate since a lot of data blocks will have to be fetched from the disk

• High page fault rate leads to client’s timeout due to long delay

Page 10: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

Results and Discussion cont.

• Throughput and CPU utilization

1

10001

20001

30001

40001

50001

1 10 100 200 300 400 500 600 700 1000

number of streams (clients)

thro

ug

hp

ut

(kb

ps)

dss, unique

dss, multiple

wms, unique

wms, multiple

Throughput (300kbps)

0

10

20

30

40

50

60

70

80

90

1 10 100 200 300 400 500 600 700 1000

number of streams (clients)

cpu

uti

liza

tio

n (

%)

dss, unique

dss, multiple

wms, unique

wms, multiple

CPU utilization (300kbps)

• Windows media server has higher throughput compared to Darwin streaming server

• For unique streams, CPU utilization scales with number of streams throughout, while is not the case with multiple streams

Page 11: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

Memory Transfer Test

• ECT (extended copy transfer)

Characterizing the memory performance to observe what might be the impact of OS on memory performance

0

1000

2000

3000

4000

5000

6000

block size (working set)

Mem

ory

ban

dw

idth

(M

byt

es/s

ec)

Linux

Windows

• Locality of reference: temporal locality – varying working set size (block size) spatial locality – varying access pattern (strides)

Page 12: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd

Conclusion

• Future research media object pre-fetching and stream batching are techniques we are exploring to improve memory performance of the servers

• Both media servers exhibit similar cache/memory behavior

• Worst cache/memory performance at 300kbps encoding rate and multiple stream distribution

• High cache misses and page faults lead to performance degradation as a result of significant wastage in CPU cycles

• For streaming media servers, apart from I/O bottleneck, memory subsystem is a potential bottleneck on performance.

Page 13: A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd