real-time stream processing architecture for comcast ip video strata conference + hadoop world 2013...

20
Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Upload: sara-ferguson

Post on 24-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Real-time Stream Processing Architecture for Comcast IP Video

Strata Conference + Hadoop World 2013

Chris LintzGabriel Commeau

Page 2: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

o Comcast VIPER Overviewo Architecture Overviewo Q & A

Agenda

Page 3: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Comcast Video IP Engineering and Research (VIPER)

Packaging

Origination

Storage

Transcoding

iOS

Android

Xbox Live

Samsung

Storm

Page 4: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Why Do We Focus on Real-time?

• Proactively diagnose issues

• Form real-time intelligence

• Help deliver best possible video experience

Prime Time

Viewership

Page 5: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Video Player Analytics Protocol

• Live and On Demand• JSON event objects• Key metrics• Bitrate• Frame rate• Fragments• Errors

We collect and use all data in accordance with best consumer privacy practices and applicable laws

Page 6: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Player Sessions: Key In Understanding Video Experience

Page 7: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

High Level Architecture And Data Flow

Page 8: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

o Collect, aggregate and move large amounts of datao Distributed, scalable, reliable, customizableo Multi-tier architecture

Flume: Data collection Tier

Page 9: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Storm: Stream Processing Tier

Page 10: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

o Sessions in Flume?• Technical issues: consistent hash and exactly-once semantics• Design goals• Separation of concerns

o Session write-through rate?

Player Sessions in Real-time

Page 11: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

o Analytics events over HTTPSo HTTP Sourceo Re-batch with inner sink and source

Flume Edge Tier: Video Player Analytics End Point

Page 12: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

o Video Player Event processing• Geo-location, asset metadata, validation, to-storm

o Replication channel processor:• HDFS sink• Storm sink

Flume Mid Tier: Processing and Routing Data

Page 13: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

o Service discoveryo Distributed, scalable and reliableo Low latency

Bridging Flume to Storm: Flume2Storm Connector

Page 14: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Simplified Video Player Storm Topology

Page 15: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

o Functionality beyond key/value storeso Real-time and historic window querieso Speed of in-memory writes and durability of disk

Requirements for Read/Writes from Storm Bolts

Page 16: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Utilizing MemSQL for Persistence

• Distributed in-memory SQL database

• ACID, highly available, fault tolerant

• Aggregators route queries to leaves

• Leaves are auto-sharded• Solves our intense read/writes

Page 17: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Isolated Analysts and Ingest Aggregators

Page 18: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Achievements In Utilizing MemSQL

• Complex queries in milliseconds

• Fault-tolerant Storm bolt state

• Joins now available outside of Storm bolts• Foreign key shards

• Complex data streams • Dynamic alters without locks

or down time• JSON type

Page 19: Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau

Wrapping Up

o Real-time at Comcast scale• Millions of video players• Horizontal scale everywhere• Aggregated metrics across US and complex analysis• Real-time API

o Builds foundation• Advanced real-time analytics • Better platform for innovation

– Alerts on complex objects– Supplemental real-time data back to clients– Popularity-based CDN