optimising digital content delivery
TRANSCRIPT
![Page 1: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/1.jpg)
Optimising digital content delivery
Tamas Jambor
University College London
EPSRC Industrial CASE
![Page 2: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/2.jpg)
Structure of the talk
• Problem description• Features of the data• Baseline algorithms• Modified algorithms for content delivery
– Time-aware models
• Evaluating efficient content delivery • Future work
![Page 3: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/3.jpg)
Background
• Video traffic increasing over the internet
![Page 4: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/4.jpg)
Increased video traffic
• Peak-time traffic slows connection speed• Delivering videos beforehand
– Cheaper to deliver– Reduce peak time traffic– User can watch content instantly (slow connection)– HD content can be delivered (slow connection)
![Page 5: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/5.jpg)
Features of the data
• Film Data (views and previews)– 1 July 2009 – 31 January 2010– 2.3 million entries, 64 000 users, 1300 assets
• Removing inconsistencies– Unknown entries– Assets end earlier than assets start
• After filtering– 1.9 million entries, 64 000 users, 1267 items
![Page 6: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/6.jpg)
Training and test sets
• Requirements– Any user has to have at least one preview or view in the
training and one view in the test– No previews in the test
• Training– 1 July 2009 – 31 December 2009– 1.2 million entries, 26 000 users, 1267 items
• Test– 1 January 2010 – 31 January 2010– 72000 entries, 26 000 users, 1267 items
![Page 7: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/7.jpg)
Unique features of the dataset
• Implicit feedback carries less information– Feedback is expressed before an opinion could be
formed• User might not like the item
– Implicit feedback recommender systems make assumptions on missing rating scores• User is not interested• User does not know the item
![Page 8: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/8.jpg)
Unique features of the dataset
• Preview information– Weak indication of interest
Per Item Per User0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Purchased after one dayPurchased within one day
![Page 9: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/9.jpg)
Baseline algorithm
• Implicit SVD
• Fix item or user
)()(min22
,
2,,
, u
ui
iiu
uTiiuiu
dqdqdqrw
)()( 1 urCYIYCYd uTuTu
![Page 10: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/10.jpg)
Baseline algorithm
• Advantage of this approach– Task can be divided to independent chunks (user/item)– Scalable solution – It can be computed in a parallel fashion
• Weights– Addition information / assumption about data
![Page 11: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/11.jpg)
Weights
• Weight can be assigned for each user-item pair– Previews
• Item that are previewed before are more likely to be watched
– Confidence decay in time
),|()1(),|(, iptPuptPw iu
rttiu ew ,
![Page 12: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/12.jpg)
Popular items
Frequency Avr(days) SD(days) Available (days)I Now Pronounce You Chuck & Larry (PictureBox) 4469 8.30 8.29 28.00Curious George: A Very Monkey Christmas (PictureBox) 3753 8.73 7.21 31.00Kingdom 3709 8.96 8.05 28.00Santa Claus (PictureBox) 3654 3.37 2.72 18.00Munster's Scary Little Christmas (PictureBox) 3654 8.38 8.09 28.00Inside Man (PictureBox) 3530 9.31 8.35 28.00Step Up (PictureBox) 3326 9.05 8.40 28.00Wiz 3291 14.29 12.04 41.46Smokin' Aces (PictureBox) 3253 7.68 7.64 28.00Break-Up 3203 9.32 7.84 27.96Jarhead (PictureBox) 3041 8.84 7.90 28.00Stealing Christmas (PictureBox) 3026 3.69 3.03 18.00Hangover 3006 11.10 6.88 26.56
![Page 13: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/13.jpg)
Viewing habits
40172
40172.2
40172.4
40172.6
40172.840173
40173.2
40173.4
40173.6
40173.840174
40174.2
40174.4
40174.6
40174.840175
40175.2
40175.4
40175.6
40175.7999999999
40175.9999999999
40176.1999999999
40176.3999999999
40176.5999999999
40176.7999999999
40176.9999999999
40177.1999999999
40177.3999999999
40177.5999999999
40177.7999999999
40177.9999999999
40178.1999999999
40178.3999999999
40178.5999999999
40178.7999999999
40178.99999999990
10
20
30
40
50
60
70
Patch Adams Elizabeth - The Golden Age
Date
Num
ber o
f vie
ws
![Page 14: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/14.jpg)
Viewing habits
• Viewing behaviour– During the day
• Differentiate who is watching
– During the week• Weekends/weekdays
– Categories• Some content are likely to be watched at specific times
![Page 15: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/15.jpg)
Viewing habits
• Gaussian CDF
221
2
1),,(
t
erft
![Page 16: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/16.jpg)
Prediction
• For known items
– Baseline prediction– Daily Gaussian distribution for category– Weekly Gaussian distribution for category
• For new items
– Prediction for the category– Daily Gaussian distribution for category– Weekly Gaussian distribution for category
),,(),,( ,,,,, wcwcwdcdcdbtc ttrr
),,(),,( ,,,,, wcwcwdcdcdctc ttrr
![Page 17: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/17.jpg)
Evaluation method
• Top-N Hit rate
– h = num. assets watched ∩ (top-N) recommended– v = sum the assets watched
• Overall performance
– Average performance across all users (M)
u
uu v
hl
M
iilM
l1
1
![Page 18: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/18.jpg)
Results: Top-15 Performance
500--Above 200--500 100--200 50--100 20--50 10--20 5--10 1--5 All 0
0.05
0.1
0.15
0.2
0.25
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Top-15 Hit Rate Number of users
![Page 19: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/19.jpg)
Efficient caching
• Pre-cache items that are predicted to be relevant– Cheaper to deliver– Reduce peak time traffic– User can watch content instantly (slow connection)– HD content can be delivered (slow connection)
WCC
Content Provider STB
![Page 20: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/20.jpg)
Predictive caching
CUSTOMERS
1. View History (time)
CONTENT
1. Assets2. Size3. Schedule (window start/end)4. Category
MODELS
1. Personalised Top-N2. Popular items3. Marketing suggestions
•Cost per customer•Overall cost
CACHE LIST
![Page 21: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/21.jpg)
Cost function
• Cost of delivering best effort (BE)• Cost of delivering in real time (AF)
afafbebeall ncncc **
![Page 22: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/22.jpg)
Assumptions of the model
• Two (or more) different pricing for different delivery methods
• Fixed line speed• Simplified markets• Ignore network infrastructure
![Page 23: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/23.jpg)
Preliminary Evaluation
• Hit rate– Not sensitive to sparsity– Good to measure performance
• Precision– Sensitive to sparsity and relevant items
![Page 24: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/24.jpg)
Results: Hit rate
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 480
0.05
0.1
0.15
0.2
0.25
0.3
Number of retrieved items
Hit r
ate
![Page 25: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/25.jpg)
Results: Average precision
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 480
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
0.0014
0.0016
0.0018
Number of retrieved items
Aver
age
prec
ision
![Page 26: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/26.jpg)
Sparse data
0 10 20 30 40 50 60 70 80 90100
110120
130140
150160
170180
190200
210220
2300
0.05
0.1
0.15
0.2
0.25
0.3
Average views
Profile size
Aver
age
view
s (20
10 Ja
nuar
y)
![Page 27: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/27.jpg)
Sparse data – how many items to upload
• Non-personalised– Variation between upload once a day to upload once in
a month
• Personalised– How many items the use watched recently
![Page 28: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/28.jpg)
Predictive cashing
• Error I:– Predict the number of items the user will watch
• Control the maximum number of items cached
• Error II:– Prediction accuracy
• Only predict for less risky users
![Page 29: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/29.jpg)
Maximum number of items cached
• Example– User will watch 5 items in the coming month (predicted)– Deliver real time(AF): £0.70– Deliver before(BE): £0.30
be
uafbeu c
vcn
,
66.1130.0
5*70.0, beun
![Page 30: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/30.jpg)
Performance
– Hits on cached items– Number\size of items cached
• Overall performance
beu
beuu n
hl
,
,
M
j bej
N
i bei
n
hl
1 ,
1 ,
![Page 31: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/31.jpg)
Performance of the system
• To save on cost compare– The performance of the system – Ratio between the two delivery methods
af
be
c
cl
![Page 32: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/32.jpg)
Example
– Performance• 3 hits on 5 delivered items, 2 items streamed
• Deliver real time(AF): £0.70• Deliver before(BE): £0.30
– Cost
• (expected to be less than streaming only)
6.05
3, be
beuu n
hl
42.07.0
3.0
af
be
c
cl
9.23.0*57.0*2** afafbebeall ncncc
![Page 33: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/33.jpg)
Evaluation II
• Upload ratio
• Number of items cached • Example (caf=£0.7,cbe=£0.3): for every watched item we can
cache maximum 2.3 items
• Upload hits
• Performance of the model• Example (caf=£0.7,cbe=£0.3): for ever cached item we need at
least 0.42 hits
• If both satisfied cost saving is guaranteed
be
afbe
c
c
v
n
af
be
be
be
c
c
n
h
![Page 34: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/34.jpg)
Results – Combining personalised and non-personalised recommenders
0 0.13 0.26 0.39 0.52 0.650000000000001 0.78 0.910
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
Personalised vs popular
Uplo
ad h
its
![Page 35: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/35.jpg)
Unique characteristics of the system
• Recommender algorithm– Low risk approach– No prediction if it is not likely to get it right
• Caching strategy– Only for users who will use the system– Predict the number of items to be uploaded
![Page 36: Optimising digital content delivery](https://reader036.vdocuments.us/reader036/viewer/2022062405/5570d597d8b42afb678b458b/html5/thumbnails/36.jpg)
Future work
• Test the system on other datasets• Redefine baseline algorithm• Availability might influence choice• Adaptive temporal approach
– Controlling the update of the system• How much data is flowing in• How much performance loss the system expects