learning in situ: a randomized experiment in video streaming talks/2019... · figure:pantheon...

34
Learning in situ : a randomized experiment in video streaming https://puffer.stanford.edu Francis Y. Yan, Hudson Ayers, Chenzhi Zhu , Sadjad Fouladi, James Hong, Keyi Zhang, Philip Levis, Keith Winstein Stanford University, Tsinghua University October 22, 2019 Francis Y. Yan (Stanford) October 22, 2019 1 / 32

Upload: others

Post on 21-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Learning in situ: a randomized experiment in video streaminghttps://puffer.stanford.edu

Francis Y. Yan, Hudson Ayers, Chenzhi Zhu†,Sadjad Fouladi, James Hong, Keyi Zhang,

Philip Levis, Keith Winstein

Stanford University, †Tsinghua University

October 22, 2019

Francis Y. Yan (Stanford) October 22, 2019 1 / 32

Page 2: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Outline

• Networked systems present unique challenges for machine learning

• Puffer: a live TV streaming website we built to conduct a randomized experiment

• Fugu: an adaptive bitrate (ABR) algorithm that robustly outperforms other schemes bylearning in situ (on data from the real deployment environment, Puffer)

Francis Y. Yan (Stanford) October 22, 2019 2 / 32

Page 3: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Unique challenges for ML in networking

• We don’t know how to emulate the Internet very accurately

• Mismatch between training environment (simulator, emulator, or testbed) and testingenvironment (Internet)

• Internet has too much variability and heavy tails

Francis Y. Yan (Stanford) October 22, 2019 3 / 32

Page 4: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Challenge 1: We don’t know how to emulate the Internet

Better

Figure: Best effort to emulate a wireless path between Nepal to AWS India. Mean error: 19.1%.

Francis Y. Yan (Stanford) October 22, 2019 4 / 32

Page 5: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Challenge 2: Mismatch between training and testing environments

Indigo in emulation:

Figure: Power of schemes over emulated networks with varying link rates and 50 ms min one-way delay.The schemes are split into two graphs for clarity.

Francis Y. Yan (Stanford) October 22, 2019 5 / 32

Page 6: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Challenge 2: Mismatch between training and testing environments

Indigo in real life:

Indigo

Better

Figure: Pantheon result (March 27, 2019, China to AWS Korea), P5985

Francis Y. Yan (Stanford) October 22, 2019 6 / 32

Page 7: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Challenge 3: Internet is highly variable and heavy-tailed

Will show using our randomized experiment:

• Only 3% of the eligible streams had any stalls

• With 1.75 years of data for each ABR scheme, the width ofthe 95% confidence interval on a scheme’s mean stall ratio isbetween ±10% and ±17% of the mean value• Two identical schemes will see considerable variation in

average performance

- ...until a substantial amount of data is assembledFigure: Throughputdistribution on FCC tracesand in real world.

Francis Y. Yan (Stanford) October 22, 2019 7 / 32

Page 8: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

High level takeaways

• Networked systems present unique challenges for machine learning

- Training algorithms in emulation: disappointing real-world results- Evaluating algorithms in emulation: not predictive of real-world results- Running in real life: requires a substantial amount of data to reduce statistical uncertainty

• Our solution: combining classical control with a learned network predictor, trained withsupervised learning in situ on data from the real deployment environment

- It robustly outperforms existing schemes in practice

Francis Y. Yan (Stanford) October 22, 2019 8 / 32

Page 9: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Puffer: a video streaming website running a randomized experiment

• Live TV streaming website (https://puffer.stanford.edu)

• Approved by Stanford lawyers, opened to public December 2018

• Randomizes sessions to different algorithms• Goal: realistic testbed and learning environment for research in

- congestion control- throughput prediction- adaptive bitrate (ABR)

Francis Y. Yan (Stanford) October 22, 2019 9 / 32

Page 10: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Algorithms that affect video streaming

• Congestion control: when to send each packet

• Throughput prediction: how fast can server send in near future?

• Adaptive bitrate (ABR): what version of each upcoming “chunk” to send

Francis Y. Yan (Stanford) October 22, 2019 10 / 32

Page 11: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Demo

Francis Y. Yan (Stanford) October 22, 2019 11 / 32

Page 12: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Media coverage

Francis Y. Yan (Stanford) October 22, 2019 12 / 32

Page 13: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

More press articles

Francis Y. Yan (Stanford) October 22, 2019 13 / 32

Page 14: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Tutorial of building a Fire TV app for Puffer

Francis Y. Yan (Stanford) October 22, 2019 14 / 32

Page 15: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Google ad for “tv streaming”

Francis Y. Yan (Stanford) October 22, 2019 15 / 32

Page 16: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Reddit ad

Francis Y. Yan (Stanford) October 22, 2019 16 / 32

Page 17: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Puffer experiment

• Starting from the beginning of 2019, streamed 14.2 years of video to 55,897 users using61,682 unique IP addresses

• About 7-month was spent on the “primary experiment”: a randomized trial comparingour ABR algorithm with other schemes (MPC, RobustMPC, Pensieve, and BBA)

Francis Y. Yan (Stanford) October 22, 2019 17 / 32

Page 18: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Experimental-flow diagram in CONSORT format

337,170 sessions underwent randomization1,595,356 streams56,262 unique IPs

26.8 client-years of data

97,068 sessions were excluded158,077 streams

3.2 client-years of data

◦ 53,631 streams were assigned CUBIC◦ 103,446 streams were assigned experimental algorithms for ◦ portions of the study duration

47,958 sessions were assignedFugu

233,190 streams

48,703 sessions were assignedMPC-HM

238,651 streams

48,082 sessions were assignedRobustMPC-HM236,120 streams

47,584 sessions were assignedPensieve

229,851 streams

47,775 sessions were assignedBBA

231,694 streams

139,981 streams were excluded

◦ 55,301 did not begin playing◦ 84,640 had watch time less than 4s◦ 40 stalled from a slow video decoder

144,832 streams were excluded

◦ 56,845 did not begin playing◦ 87,958 had watch time less than 4s◦ 29 stalled from a slow video decoder

144,586 streams were excluded

◦ 57,119 did not begin playing◦ 87,426 had watch time less than 4s◦ 41 stalled from a slow video decoder

138,899 streams were excluded

◦ 59,450 did not begin playing◦ 79,435 had watch time less than 4s◦ 14 stalled from a slow video decoder

142,407 streams were excluded

◦ 55,182 did not begin playing◦ 87,200 had watch time less than 4s◦ 24 stalled from a slow video decoder◦ 1 sent contradictory data

2,683 streams were truncated because of a loss of contact

2,655 streams were truncated because of a loss of contact

2,391 streams were truncated because of a loss of contact

2,599 streams were truncated because of a loss of contact

2,520 streams were truncated because of a loss of contact

93,209 streams were considered1.9 client-years of data

93,819 streams were considered1.7 client-years of data

91,534 streams were considered1.7 client-years of data

90,952 streams were considered1.6 client-years of data

89,287 streams were considered1.7 client-years of data

458,801 streams were considered8.5 client-years of data

◦ 2.7 client-days spent in startup◦ 5.1 client-days spent stalled◦ 8.5 client-years spent playing

Francis Y. Yan (Stanford) October 22, 2019 18 / 32

Page 19: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Puffer experiment

• Of the 458,801 streams in primary analysis, only 15,788 (3%) of streams had any stalls

- mirroring the ratio (7%) reported by Google

• With 1.75 years of data for each ABR scheme, the width of the 95% confidence intervalon a scheme’s mean stall ratio is between ±10% and ±17% of the mean value

- comparable to the magnitude of total benefit reported by prior work based on traces orreal-world experiments lasting hours or days

Francis Y. Yan (Stanford) October 22, 2019 19 / 32

Page 20: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Fugu: an ABR algorithm trained in situ

• Objective of Fugu: select video chunks to maximize cumulative QoE over a finite horizon

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

10 v

ersi

ons

5-step lookahead

Francis Y. Yan (Stanford) October 22, 2019 20 / 32

Page 21: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Fugu: an ABR algorithm trained in situ

• QoE: +video quality, −quality variation, −rebuffering

• max∑

QoEi =∑

(SSIMi − λ|SSIMi − SSIMi−1| − µ · Rebufferi )

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

10 v

ersi

ons

5-step lookahead

Francis Y. Yan (Stanford) October 22, 2019 20 / 32

Page 22: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Fugu: an ABR algorithm trained in situ

• Given a plan of the next 5 chunks to send and assume the transmission time of eachchunk is known, can calculate∑

QoEi =∑

(SSIMi − λ|SSIMi − SSIMi−1| − µ · Rebufferi )Pl

ayba

ck B

uffer

TimeRebuffer

Transmission Time

Chunk Length

Drains 1s/s

12

3

45

Francis Y. Yan (Stanford) October 22, 2019 20 / 32

Page 23: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Fugu: an ABR algorithm trained in situ

Remaining problems:• How do we estimate the unknown transmission time of each given chunk?

- the only uncertainty in the control

• How do we compute the optimal plan to maximize QoE?

- more efficient than exhaustive search (105 combinations)

• How do we follow the optimal plan?

- send 5 chunks and recompute the optimal plan?

Francis Y. Yan (Stanford) October 22, 2019 21 / 32

Page 24: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Solving problem 1: Transmission Time Predictor (TTP)

• Neural network predicts “how long would each chunk take?”• Input:

- sizes and transmission times of past 8 chunks- low-level TCP statistics (min RTT, RTT, CWND, packets in flight, delivery rate)- size of the chunk to be transmitted (vs. throughput predictor)

• Output:

- probability distribution over the transmission time (vs. point estimate)

• Training: supervised learning in situ on real data from Puffer

Francis Y. Yan (Stanford) October 22, 2019 22 / 32

Page 25: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Solving problem 2: Value Iteration

• Well-known technique to solve MDP

• Denote the maximum expected sum of QoE that can be achieved in the lookahead horizon

• Optimal plan can be computed with dynamic programming

It looks like

v∗i (Bi ,Ki−1) = maxK si

{∑Ti

Pr[T̂ (K si ) = Ti ]·

(QoE (K si ,Ki−1) + v∗i+1(Bi+1,K

si ))

}

Francis Y. Yan (Stanford) October 22, 2019 23 / 32

Page 26: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Solving problem 3: Model Predictive Control (MPC)

• Send only one chunk following the optimal plan

• Replan before sending the next chunk to mitigate accumulation of errors

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

1080p-22

1080p-24

720p-20

240p-26

10 v

ersi

ons

5-step lookahead

1080p-22

1080p-24

720p-20

240p-26

Francis Y. Yan (Stanford) October 22, 2019 24 / 32

Page 27: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Fugu: an ABR algorithm trained in situ

Data Aggregation

Transmission Time Predictor MPC Controller

PufferVideo Server

bitrateselection

stateupdate

updatemodel

da

ily t

rain

ing

mo

de

l-ba

se

d c

on

trol

Francis Y. Yan (Stanford) October 22, 2019 25 / 32

Page 28: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

SSIM vs. Stalls (458,801 streams, 8.5 years of data)

16.25

16.5

16.75

0.10.20.3

Aver

age

SSIM

(dB

)

Time spent stalled (%)

Fugu

MPC-HM

RobustMPC-HM

Pensieve

BBA

Bette

r QoE

Francis Y. Yan (Stanford) October 22, 2019 26 / 32

Page 29: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

SSIM vs. Stalls (< 6 Mbps only, 100,500 streams, 1.3 years of data)

13.5

14

14.5

15

15.5

0.50.7511.25

Aver

age

SSIM

(dB

)

Time spent stalled (%)

FuguMPC-HM

RobustMPC-HM

Pensieve

BBA

Bette

r QoE

Francis Y. Yan (Stanford) October 22, 2019 27 / 32

Page 30: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

First-chunk SSIM vs. Startup delay (cold start)

9.9

10

10.1

10.2

10.3

0.480.50.520.540.56

Aver

age

first-

chun

k SS

IM (d

B)

Startup delay (s)

Fugu

Pensieve BBA

RobustMPC-HM

MPC-HM

Bette

r QoE

Figure: TTP’s use of low-level TCP statistics boosts initial quality.

Francis Y. Yan (Stanford) October 22, 2019 28 / 32

Page 31: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Mismatch between emulation and real world

12.5

13

13.5

14

14.5

15

00.250.50.7511.251.5

Aver

age

SSIM

(dB

)

Time spent stalled (%)

Fugu

BBA

Pensieve

RobustMPC-HM

MPC-HM

Bette

r QoE

Figure: Performance in emulation on FCC traces.

15

16

17

0.10.20.30.40.50.6

Aver

age

SSIM

(dB

)

Time spent stalled (%)

FuguMPC-HM

RobustMPC-HM

Pensieve

BBA

Emulation-trained Fugu

Bette

r QoE

Figure: Puffer results during Jan.–Apr. 2019.

Francis Y. Yan (Stanford) October 22, 2019 29 / 32

Page 32: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Users randomly assigned to Fugu watched 10%–20% longer

0.0001

0.001

0.01

0.1

1

10 100 1000

CCD

F

Total time on video player (minutes)

Fugu (mean 32.6 ± 1.1 min [95% CI])MPC-HM (mean 27.9 ± 0.9)

RobustMPC-HM (mean 27.4 ± 0.9)Pensieve (mean 28.5 ± 0.9)

BBA (mean 29.6 ± 1.0)

Better

Francis Y. Yan (Stanford) October 22, 2019 30 / 32

Page 33: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Ablation study of Fugu’s TTP

Francis Y. Yan (Stanford) October 22, 2019 31 / 32

Page 34: Learning in situ: a randomized experiment in video streaming Talks/2019... · Figure:Pantheon result (March 27, 2019, China to AWS Korea), P5985 Francis Y. Yan (Stanford) October

Takeaways

• Networked systems present unique challenges for machine learning

- Training algorithms in emulation: disappointing real-world results- Evaluating algorithms in emulation: not predictive of real-world results- Running in real life: requires a substantial amount of data to reduce statistical uncertainty

• Our solution: combining classical control with a learned network predictor, trained withsupervised learning in situ on data from the real deployment environment

- It robustly outperforms existing schemes in practice

• We are opening Puffer to the research community for others to develop and deploycongestion control and ABR algorithms on real traffic.

Francis Y. Yan, [email protected]

Francis Y. Yan (Stanford) October 22, 2019 32 / 32