Download - An overview of Hulu’s metrics platform
![Page 2: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/2.jpg)
What we do
• Streaming video service• > 5.5 million subscribers• > 20 million unique
visitors/month• > 1 billion ads/month
![Page 3: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/3.jpg)
It all begins with beacons
Living room device(Roku, Xbox, etc)
Mobile device(Android, iPhone,
etc)
Web(hulu.com)
Beacon collection service
![Page 4: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/4.jpg)
What’s in a beacon
80 2013-04-01 00:00:00/v3/playback/start?bitrate=650&cdn=Akamai&channel=Anime&clichéent=Explorer&computerguid=EA8FA1000232B8F6986C3E0BE55E9333&contentid=5003673…
![Page 5: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/5.jpg)
Reporting platform (RP2)Find Metrics & Dimensions
Design and execute reports
![Page 6: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/6.jpg)
The pipeline
Devices Beacon collection serviceDevices
Devices
HDFS
Hive
RDBMS
LogCollector/Flume
MapReduce jobs/JobScheduler
Harpy – continuous aggregation
Reporting(RP2)
Monitoring(metstat)
Developers
Business
![Page 7: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/7.jpg)
HDFSFiles bucketed by beacon
type and partitioned by hour
Log Collection machine #1
Log Collection
…
Load balancer
DevicesDevicesDevices
Log Collection machine
#11
![Page 8: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/8.jpg)
Directory hierarchy on HDFS
/user/hadoop/t2
201401010000/
playback/
201401010100_playback_1.se
q
201401010100_playback_2.se
q
…revenue/
201401010100
playback/
revenue/
![Page 9: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/9.jpg)
MapReduce - going from beacons to basefacts
computerguid EA8FA1000232B8F6986C3E0BE55E9333
userid 5238518video_id 289696content_partner_id 398distribution_partner_id 602distro_platform_id 14is_on_hulu 0…hourid 383149watched 76426
![Page 10: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/10.jpg)
If a program manipulates a large amount of data, it does so in a small number of ways- Alan Perlis
![Page 11: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/11.jpg)
The BeaconSpec compiler
Definitions of beacons and
base-facts
Beaconspec compiler
Java MapReduce
code that can run on the
cluster
![Page 12: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/12.jpg)
What does our language look like?
basefact playback_watched_uniques from playback/(position|end) { dimension harpyhour.id as hourid; dimension computerguid as computerguid; dimension userid as userid; required dimension video.id as video_id; required dimension contentPartner.id as content_partner_id; …
dimension siteSessionId.chosen as site_session_id; dimension facebook.isfacebookconnected as is_facebook_connected; fact sum(watched.out) as watched;}
FAQ: Why didn’t we just use Pig?
![Page 13: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/13.jpg)
The superior [program] cultivates itself so as to give rest to [programmers]- Confucius, the Way of the Superior Man
![Page 14: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/14.jpg)
Scheduling jobs
JobScheduler Interface
Outside world
Logmanager databases
JobScheduler
Checks databases for jobs that are ready to
run and whether dependencies are met
JobMonitorMapReduce
job
JobMonitorMapReduce
job
JobMonitorMapReduce
job
![Page 15: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/15.jpg)
JobScheduler technology
• The actor model of concurrency– Communication through async messaging– Completely encapsulated state
![Page 16: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/16.jpg)
Actor creation
Message passing
Central idea: Treat local objects as if they are distributed, as opposed to treating distributed objects as if they are local
![Page 17: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/17.jpg)
Fault-tolerance – let it crash!
![Page 18: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/18.jpg)
Harpy – continuous aggregations
HDFS NFS
Metadata
Output DBs
Harpy
DataSync
Publishing
HoldingDB
HoldingSweeper Agg
Scheduler
Queue Processor
Hive
![Page 19: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/19.jpg)
RP2
• Reporting Portal for pulling Metrics + Dimensions
• Quick ‘Demo’
![Page 20: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/20.jpg)
Let’s Reexamine the pipeline:
Devices Beacon collection serviceDevices
Devices
HDFS
Hive
RDBMS
LogCollector/Flume
MapReduce jobs/JobScheduler
Harpy – continuous aggregation
Reporting(RP2)
Monitoring(metstat)
Developers
Business
![Page 21: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/21.jpg)
![Page 22: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/22.jpg)
Metstat
• Python Django App• Tasks on Celery + RabbitMQ• JQuery• Tracks status, status changes and statistics• Gets data directly from various sources
(databases, HDFS)
![Page 23: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/23.jpg)
FAQ: Why didn’t we just use Pig?
• Dataflow language – runs on Hadoop• Pig philosophy – (Taken from the Apache website)– Pigs eat anything– Pigs live anywhere– Pigs are domestic animals– Pigs fly
Beaconspec
![Page 24: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/24.jpg)
Beware of the Turing tar-pit where everything is possible but nothing of interest is easy - Alan Perlis
REGISTER ./tutorial.jar; raw = LOAD 'excite.log' USING PigStorage('\t') AS (user, time, query);clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(query);clean2 = FOREACH clean1 GENERATE user, time, org.apache.pig.tutorial.ToLower(query) as query;
Beaconspec
![Page 25: An overview of Hulu’s metrics platform](https://reader035.vdocuments.us/reader035/viewer/2022062814/56816772550346895ddc610f/html5/thumbnails/25.jpg)
FAQ: What is open sourced?
• Slickint – database interface generation for Scala– github.com/zenbowman/slickint
• Local filesystem caching for hadoop– github.com/ZenBowman/luna