http streaming & optimization areasavereshc/rl_spring20/bekir... · 2020-05-18 · how to train...

32
HTTP Streaming & Optimization Areas 0

Upload: others

Post on 12-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

HTTP Streaming & Optimization Areas

0

Page 2: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

HAS Main Principles

• Store data in small chunks for different Representations (resolution, bitrate, frame rate, codec)

• Monitor network conditions

• Adapt the transmission data rate

1

Page 3: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Video Client Video Server

Request: next video chunk at bitrate r

Response: video content

InputOutput

1 sec/sec

0

1

2

3

0 50 100 150 200 250 300

Thro

ugh

pu

t

Time

Animation borrowed from Te-Yuan Huang (SIGCOMM ‘14) http://conferences.sigcomm.org/sigcomm/2014/doc/slides/38.pdf

Dynamic Streaming over HTTP (DASH)

2

bitrateAdaptive Bitrate (ABR)

Algorithms

1 sec video

content

bitrate

Page 4: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Regular ABR Algorithms

• Rate-based: pick bitrate based on predicted throughput• FESTIVE [CoNEXT’12], PANDA [JSAC’14], CS2P [SIGCOMM’16]

• Buffer-based: pick bitrate based on buffer occupancy • BBA [SIGCOMM’14], BOLA [INFOCOM’16]

• Hybrid: use both throughput prediction & buffer occupancy• PBA [HotMobile’15], MPC [SIGCOMM’15]

3

Simplified inaccurate model leads to suboptimal performance

Page 5: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Why is ABR Challenging?

Network throughput is variable & uncertain

4

Throughput

Video bitrateConflicting QoE goals

• Bitrate

• Rebuffering time

• Smoothness

Cascading effects of decisions

Thro

ugh

pu

t B

itra

te(M

bp

s)

Bu

ffe

r si

ze(s

ec)

Page 6: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Neural Adaptive Video Streaming with Pensieve

Hongzi Mao

Ravi Netravali Mohammad Alizadeh

Page 7: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Pensieve

6

buffer

ABR agentbitrates

240P

480P

720P

1080P

network and video measurements

bandwidth

bit rate

720P

Pensieve learns ABR algorithm automatically through experience

Page 8: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Reinforcement Learning

7

Goal: maximize the cumulative reward

Agent Environment

Observe state

Take action

Reward

Page 9: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Action

Pensieve Design

State

xt xt-1

n1 n2 nm

bt

ct

lt

Past chunk throughput

Next chunk sizes

Current buffer size

Remaining chunks

Last chunk bit rate

State st

τt τt-1

xt-k+1

τt-k+1

Past chunk download time

btPast chunk bitrate

st

ct

Environment

+ 𝑞 𝑏𝑡 − 𝜇𝑇𝑡 − 𝜆 𝑞 𝑏𝑡 − 𝑞 𝑏𝑡−1

Reward rt

+ (bitrate) - (rebuffering) - (smoothness)

720P

240P

360P

720P

1080P

Act

ion

a t

Reward

1D-CNN

1D-CNN

1D-CNN

1080P

720P

360P

240P

Agent

Page 10: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

9

How to Train the ABR Agent

ABR agent

state

Neural Network

240P

480P

720P

1080P

policy πθ(s, a)

Take action anext bitrate

Observe state s

parameter θ

estimate from empirical data

Training:

Collect experience data: trajectory of [state, action, reward]

Page 11: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

10

Traces for Generalization

3G network trace

• Trace generated from a Hidden Markov model

• Covers a wide range of average throughput and network variation

Synthetic trace

Page 12: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Shortcomings of Existing ABRs

• Greedy to bitrate (Do not consider perceptual quality)

• Energy consumption generally not included

• Does not reflect real world implementations

• Quality models are linear and have some wrong assumptions• Penalize all rebuffering events same (in the beginning or in the middle)

• Penalize all oscillations same (highest to lowest or highest to 2nd highest)

• Higher bitrate is always necessary for better experience

11

Page 13: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Optimization ProblemHow to optimize energy consumption without sacrificing quality of

experience?

12

Page 14: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Preliminary Results

13

Page 15: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

What we have?

• Information about representations• File size

• Quality metrics

• Power model

• Real world network traces

• Simulator Environment

14

Page 16: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Using Reinforcement Learning

• Environment provides information for• Energy consumption of each available option

• Quality metrics of each available option

• Past network conditions

• Current buffer conditions

15

Page 17: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Using Reinforcement Learning

• State Space• Buffer size

• Current chunk size/number

• Throughput

• Download time

• Rebuffer time

• Remaining chunks

• Action Space• Different versions of the video for each chunk

16

Page 18: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Using Reinforcement Learning

• Reward model should contain• Energy consumption

• Quality metrics

• Oscillations

• Rebuffering

17

Page 19: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Experiments

• On Pensieve• Pensieve is trained with updated reward models

• VMAF-E, VMAF-Q, VMAF-EQ, VMAF-LN

• On DQN based model • Trainings with updated state space

• Trainings with different network architectures• MLP, 1Conv1D+MLP, 2Conv1D+MLP

• Trainings with different reward models• VMAF-LN, VMAF-E, VMAF-ES

18

Page 20: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Pensieve Experiments

19

Page 21: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Pensieve Training Results Avg Reward 10K ~ -232

20

Page 22: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Pensieve Training Results Avg Reward 70K ~ -232

21

Page 23: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

DQN Experiments Different Network Traces

Multiple Traces – Avg reward ~ -20 Single Trace – Avg reward ~ 8

22

Page 24: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

DQN Experiments Different States

St. Space with 5 components (avg 8) St. Space with 6 components (Avg 11)

23

Page 25: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

DQN with Different Neural Networks

MLP Avg ~ -20 1 layer Conv1D + MLP Avg ~ -44

24

Page 26: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

DQN with Different Neural Networks

MLP Avg ~ -20 2 layer Conv1D + MLP Avg ~ - 24

25

Page 27: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

DQN Different Reward Models

26

Page 28: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

DQN Different Reward Models

27

Page 29: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

28

Page 30: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

29

Page 31: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

Q&A

30

Page 32: HTTP Streaming & Optimization Areasavereshc/rl_spring20/Bekir... · 2020-05-18 · How to Train the ABR Agent ABR agent state Neural Network 240P 480P 720P 1080P policy πθ (s, a)

References

• Neural Adaptive Video Streaming with PensieveHongzi Mao, Ravi Netravali, Mohammad AlizadehProceedings of the 2017 ACM SIGCOMM Conference

31