designing qoe experiments to evaluate peer-to-peer streaming applications tom z.j. fu, cuhk dah ming...
Post on 20-Dec-2015
220 Views
Preview:
TRANSCRIPT
Designing QoE experiments to evaluate Peer-to-Peer stre
aming applications
Tom Z.J. Fu, CUHK
Dah Ming Chiu, CUHK
Zhibin Lei, ASTRI
VCIP 2010, Huang Shan, China
Outline
Introduction & motivationChunk-level impairment modelExperiment settingResult analysis and insightsFuture work & conclusion
Internet streaming service becomes popular S/C mechanism, P2P mechanism, which is mostly implemented.
CDN, single/multiple tree-based application layer multicast, peer-to-peer streaming (live streaming / VoD).
There is a need to evaluate different mechanisms by some proper methodology. E.g. different strategies used in P2P system.
Two types of evaluation method Objective: measurement on objective metrics (plr, trans. delay) Subjective: inviting subjects to give scores
Introduction and motivation
Introduction and motivation1. Existing methods are not suitable
Only packet-level impairment model for single link network transmission (packet loss rate, packet end-to-end delay, etc) is considered.
A chunk (much larger than one packet) becomes basic unit of almost all the building blocks and designing issues for most large-scale P2P streaming systems.
2. Various objective metrics are defined in different systems and analytical models
Buffer count (UUSee measurement); Playback continuity (several different definitions, Coolstrea
ming, PPlive Measurement, etc.); Subjective testing validation is necessary.
Sourcevi deo(SRC)
Vi deoencoder
Networktransmissi on
Vi deodecoder
Processedvi deo(PVS)
Peer
PeerPeerPeerChunkmaker
Chunkbuff ermanager
Peer-to-Peer mechani sm
Chunk-l evel di storti on Pl aybackControl l er
Di storti on generator
Peer Peer
Chunk-l evel impai rment module
Chunk si ze
Fi g.1Traditional HRC includes: source video (SRC), video encoder, network transmission, video decoder, processed video (PVS).
Chunk-level impairment model
Packet-level impairment for single link (e.g. plr, end-to-end delay)
Chunk-level impairment: for dynamic topology; and various strategies
Video encoder– Different media codec, transmission rate could be chosen at the
video encoder component Network transmission – chunk level impairment module
Chunk maker– responsible for organizing video stream packets into chunks.
Chunk-level distortion generator – three different ways are designed to implement chunk-level
distortion generator Chunk buffer manager and playback controller:
– manages and keeps the received chunks in a local chunk-level buffer;
– make playback decision for each chunk. Video decoder
– After being decoded by the video decoder component, the processed videos (PVS) are then displayed in the monitors to the users.
Chunk-level impairment model
Notations:– Ti
e: the expected playback time of the ith chunk;– Ti
s: the start download time of chunk i;– Ti
c: the complete download time of chunk i;1. Chunk-level delay.
Chunk i is delayed if Di = {Tic - Ti
e}+ > 0, where {x}+ = x when x > 0, otherwise 0.
2. Chunk delay distribution (CDD). Chunk delay distribution is aggregate statistics for all delayed chunks. I
n the simplest case, it can be represented by a discrete random variable. 3. Chunk receiving pattern (CRP).
It describes how a chunk, i, is filled over the whole downloading process. If we denote fi(t), t∈[Ts
i, Tci], to be the download completion percenta
ge of chunk i at time t, then mathematically, CRP could be represented by any increasing curve of fi(t) over t∈[Ts
i, Tci] with constraints fi(Ts
i) = 0 and fi(Tc
i) = 1.
Chunk-level impairment model
Illustration of different CRPs Curves A, B, C, D have the same start
downloading time Tsi (1 second before
Tie) and finish time Tc
i (4 seconds after
the Tie).
Chunk generated by curve A will always receive more contents than that of B, C and D.
At t = Tie, the expected playback time,
A generates chunk with 80% of the completeness while B only generates 20%, C close to 0% and D 0%.
Note: in this work, we only apply the simplest pattern,(Curve D, i.e. all contents arrive at same time, Tc
i), the complicated curves will be studied in future work
Live experiments Most detailed CRP for each chunk can be collected
and recorded during a real-life experiment Simulation results
It is possible to simulate a large network with a large number of users, and have the simulation repeatable. The same kinds of detailed CRP traces can be collected.
Artificial generating Manually create different possible chunk delays (by
following certain distribution) or chunk-level receiving patterns (by implementing fi(t) with different increasing curves and parameters), for subjective testing purposes.
Chunk-level distortion generator
For the P2P streaming system, the playback controller acts as an essential role.
Chunks can be considered as two cases: non-delayed chunk, complete downloading on or before Ti
e. delayed chunk (Di > 0). Not complete when meets Ti
e. PC deals with the two cases:
non-delayed chunk: move it out of the local buffer and send it to the decoder to be played back.
delayed chunk: three possible actions might be taken, but not limited.a) Wait until the chunk is complete and then send to decoder;b) Directly send the incomplete chunk to the decoder with no waitin
g;c) Wait for at most longest waiting time (LWT), either the timer expi
res or the chunk is complete, the PC stops waiting and sends it to the decoder immediately.
Simple playback controller
LWT = ∞, case (a)
LWT = 0, case (b)
LWT in between, case (c)
Simple playback controller
Note: implementation of PC can be more complicated, and this will be studied in future.
Experiment goal: To validate the effectiveness of some well-studied performance
metrics, e.g. the average playback (dis)continuity.
If the correlation does exist, try to find out a simple mapping function between the objective and subjective metrics.
To explore the relationship between chunk delay distribution (CDD) and subjective QoE.
Learn useful insight to help on design of the streaming peer software.
Experiment settings: 50 source video clips with average length of 30 seconds; 30 subjects (16 males and 14 females), age range (18 - 28); Assessment scheme: Absolute Category Rating (ACR) with
hidden reference
Experiment setting
Source videos (SRC)
Simply deployed decoder If the video chunk sent from the PC is incomplete, discard
it; Otherwise decode and playback it (just for Curve D).
If there is no chunk received from PC at the expected playback time, the decoder simply freezes at the last playable image until new content arrives.
Experiment setting
Due to the implementation of the decoder, there are three possible viewing effects caused by chunk-level distortions: Di = 0, no distortion.
If chunk i is completed before its expected playback time, it will be normally decoded and played back;
0 < Di < LWT, freeze-and-play viewing effect.
If chunk i is delayed but still completed before LWT, the resulting effect in PVS is firstly freezing at an image for duration of Di and then normally playing back chunk i.
Di >= LWT, freeze-and-discard viewing effect.
If chunk i is delayed and remains incomplete until LWT expires, the effect in PVS is freezing at an image for LWT and then directly jumping to chunk i + 1.
Resulting viewing effects
A reduced set of 20 combinations of chunk-level distortions composed of two factors: Average discontinuity (d = 1 − c):
– 0, 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, where
Tow types of chunk delay distribution (CDD): 1. Short delay distribution:
delays uniformly distributed in [0, 2] seconds; or 2. Long delay distribution:
all delays equal to 3 seconds, ( = LWT, LWT is set 3 seconds by default).
Testing set
Subjective assessment results for each processed video sequence MOS value (left), DMOS value (right):
The meaning for Mean Opinion Score (MOS) and DMOS:
Result analysis and insights
Insights from the subjective assessment results:
1. The DMOS (right) is consistent with MOS (left) analysis which means the experiment results are reasonable, where:
– DMOS is derived by subtracting the MOS of the PVS from the MOS of the reference video (of same category and with no distortion).
– DMOS metric removes the bias in the subjective scoring process caused by individual’s preference of video contents.
2. The correlation between objective metric and subjective QoE exists
3. The line derived by linear regression of the discontinuity (d) and subjective (MOS) can be made use of later (when we need to predict QoE by measured discontinuity metric w/o conducting subjective testing, so saving cost).
Result analysis and insights
Result analysis and insights Comparison between short and long chunk delay
distribution, MOS value (left), DMOS value (right):
Insights from the comparison:1. PVSes with long delay distribution obtain higher MOSes than
those with short delay distribution when average d is same.2. Subjects care more about the number of screen freezing
events than the duration of each freezing event.
Future work & conclusion Future work
Conduct more experiments with different parameter settings.
Change the implementation of decoder, to support incomplete chunk and concealment algorithm
Based on such framework, study more complicate design of playback controller (how long to wait for delayed chunk)
Study different chunk-receiving patterns. Conclusion
Chunk-level impairment model is proposed for P2P mechanism.
By applying this new model, we carry out subjective experiments
The results are preliminary but still get some interesting insights.
The end
Thanks!
Q & A
top related