add image. 3 “ content is not king ” 1950 1980 1995 today 3 40 analog cable digital cable...
Post on 24-Dec-2015
215 Views
Preview:
TRANSCRIPT
I Tube, You Tube, Everybody Tubes…
Pablo RodriguezTelefonica ResearchBarcelona
add image
YouTube Video Example
3
“Content is NOT king”
Content Explosion
1950 1980 1995 today
340
analogcable
digitalcable
Internet
100
infinite
broadcast
Time
Nu
mb
er
of
TV
ch
an
nels
4
How to search content?
5
Infinite Choice = Overwhelming Confusion
Filters required toconnect users with
content that appeal to their
interests
Aggregation and Recom-mendation
6
Video and Social Net-works
Trends in video services Users generate new videos Users help each other finding videos
Need to understand users and con-tents Video characteristics in YouTube User-behavior and potential for recom-
mendations
7
Particularities of
“bite-size bits for high-speed munch-
ing” [Wired mag. Mar 2007]
Plethora of YouTube clones
UGC is very different
How different?
8
UGC vs. Non-UGC
Massive production scale15 days in YouTube to produce 120-yr worth of movies in IMDb!
Extreme publishers1000 uploads over few years vs. 100 movies over
50 years
Short video length30 sec–5 min vs. 100 min movies in LoveFilmthe rest: consumption patterns
9
User Participation/Finding Videos
Despite Web 2.0 features, user participation remains low Only 0.16%-0.22% viewers rate videos/
comment.
47% videos have pointers from ex-ternal sites But requests from such sites account for
less than 3% of the total views
10
Goals and Data
Potential for recommendation sys-tems?
Popularity evolution Content Duplication
Crawled YouTube and other UGC sys-temsmetadata: video ID, length, views1.6M Entertainment, 250KScience videos
Goals
Data
11
Part1: Popularity Distri-bution• Static popularity characteristics• Underlying mechanism
12
Pareto Principle
Normalized video rank-ing
Fra
cti
on
of
ag
gre
gate
vie
ws
Other online VoD systems show smaller skew!
10% popular videos account for 80% total views
13
Dominant Power-Law Behav-ior
Richer-get-richer principleIf video has K views, then users will watch the
video with rate K
- word frequency- citations of papers - scale of earthquakes- web hits
City population (log)
Freq
uen
cy (
log)
y=xa
14
UGC Video Distribution
Straight-line waists and truncated both ends
15
Focusing on Popular Videos
Why popular videos deviate from power-law?
Fetch-at-most-once [SOSP2003] Behavior of fetching immutable objects
oncecf. visiting popular web sites many times
16
Why the Unpopular Tail Falls Off
Natural shape is curved
Sampling bias or pre-filters Publishers tend to upload interesting
videos
Information filtering or post-filters Search results or suggestions favor popular
items
17
Impact of Post-Filters
Videos exposed longer to filtering effect appear more truncated
video rank
18
Is it Naturally Curved?
Science videos
Zipf
Log-normalExponen-tial
Zipf + exp cut-
off
Matlab curve fitting for Science
19
Is it Naturally Curved?
Science videos
Zipf
Log-normalExponen-tial
Zipf + exp cut-
off
Zipf is scale-free, while exponential is scaled :
underlying mechanism is Zipf and truncation is due to bottlenecks
Matlab curve fitting for Science
20
”
Implication of Our Findings
Latent demand for products that is sup-pressed by bottlenecks in the system
[Chris Anderson, The Long Tail]
“
Rankings
Vie
ws
Entertainment
40% additional views!How?
Personalized recommendationEnriched metadataAbundant videos
21
Part2: Popularity Evolu-tion• Relationship between popularity and age
22
Popularity Evolution
So far, we focused on static popularity Now focus on popularity dynamics
How requests on any given day are dis-tributed across the video age?
6-day daily trace of Science videos Step1- Group videos requested at least once by
age Step2- Count request volume per age group
23
Request Volume Across AgeUser preference relatively insensitive to age--> 80% requests on videos older than a month
The probability of a video being watched is 43%, 18%, 17% and 14% for the first 24 hours, 6 days, 3 weeks, and 1 month accordingly
24
Part4: Content Duplica-tion• Level of duplication• Birth of duplicates
25
Content Duplication
Alias- identical or similar copies of the same content
Aliases dilute popularity of a single event Views distributed across multiple copies Difficulty in recommendation & ranking systems
Test with 51 volunteers Find alias using keyword search Identified 1,224 aliases for 184 original videos
26
The Level of Popularity Dilution Popularity diluted up to few-orders magnitude
Often aliases got more requests than original (e.g. alias got >1000 times more re-quests)
27
How Late Aliases Appear?
Significant aliases appear within one week
Within the first day of posting the original video, sometimes you get more than 80 aliases
28
Conclusions
UGC is a new form of video social interac-tion
User interaction remains low
Lots of potential for social recommendations
29
Questions?Dataset available at http://an.kaist.ac.kr/traces/IMC2007.html
top related