nearline systems to improve netflix recommendations
TRANSCRIPT
![Page 1: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/1.jpg)
Near line systems to improve Netflix recommendations
Gopal Krishnan
Feb 2015
![Page 2: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/2.jpg)
About me
Gopal Krishnan
Director, Consumer Science Engineering
Netflix, Inc.
Driving innovation through AB testing the member experience.
Twitter: @sgkrishnan
LinkedIn: https://www.linkedin.com/pub/gopal-krishnan/0/7a7/905
![Page 3: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/3.jpg)
Netflix: global streaming video service for TV and movies
![Page 4: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/4.jpg)
Netflix is available on 1000+ devices
![Page 5: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/5.jpg)
More than 57M members globally
• In more than 50 countries
• Planning to launch in all (200+) countries in 2 years.
![Page 6: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/6.jpg)
Netflix Consumes 34% of peak downstream bandwidth in North America
![Page 7: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/7.jpg)
Netflix Consumes 6% of peak upstream bandwidth in North America
![Page 8: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/8.jpg)
What my team does?
• Help improve rate of innovation through AB testing to improve member experience
• Infrastructure for algorithmic support
– Feature value store to help model training
– Services to store and serve explicit data sources
– Services to collect, process, validate, and serve implicit data sources
– Caching services
• Data improves our understanding of end to end user behavior
![Page 9: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/9.jpg)
Every part of Netflix is personalized
![Page 10: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/10.jpg)
Every part of Netflix is personalized
![Page 11: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/11.jpg)
Every part of Netflix is personalized
![Page 12: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/12.jpg)
NETFLIX RECOMMENDATIONS WITH ONLINE MICRO SERVICES
![Page 13: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/13.jpg)
Life Cycle of Netflix Recommendation Data
Devices
Data Collection
Offline Big Data Analysis
Netflix recommendation:
online services
Netflix API Netflix beacon telemetry
![Page 14: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/14.jpg)
Data Collection: explicit inputs
Plays
Star ratings
![Page 15: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/15.jpg)
Data Collection: explicit inputs
![Page 16: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/16.jpg)
Data Collection: explicit inputs
Virtual plays from new user on-boarding
![Page 17: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/17.jpg)
Outputs from offline analysis
Devices
Data Collection
Offline Big Data Analysis
Netflix recommendation:
online services
Netflix API Netflix beacon telemetry
“Implicit” Data Services
Popularity Targeting
User clustering
![Page 18: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/18.jpg)
Recommendations combines both online and aggregated offline data
Devices
Data Collection
Offline Big Data Analysis
Netflix recommendation:
online services
Netflix API Netflix beacon telemetry
“Explicit” Data Services
My List On Ramp
Taste pref
“Implicit” Data Services
Popularity Targeting
User clustering
![Page 19: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/19.jpg)
WHY BOTHER WITH NEAR LINE SYSTEMS THEN?
![Page 20: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/20.jpg)
Our algorithms became too complex to be computed online leading to higher latency.
Near line systems improve our availability story.
Near line systems allow us to innovate at a greater velocity.
![Page 21: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/21.jpg)
Near line systems improve agility and availability
Devices
Data Collection
Big Data Analysis(Hadoop, Teradata)
Netflix recommendation:
online services
Pre-computed recommendations
“Explicit” Data Services
“Implicit” Data Services
Post-processat run time
![Page 22: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/22.jpg)
Manhattan pre-compute engine
Manhattan: Netflix pre-compute engine
Video Ranker
Row selection
Similars
Top picks
![Page 23: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/23.jpg)
What data would improve recommendations even further?
![Page 24: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/24.jpg)
All UI Events from all key platforms
• Moving beyond explicit inputs from users, we would like to track all member activity to derive deeper insights.
• Challenges include:
– 1000s of device platforms
– Non-standardized UIs across different platforms
– Lack of earlier focus on tracking the browse experience
![Page 25: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/25.jpg)
Patterns arise in aggregate
![Page 26: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/26.jpg)
Challenges with collecting UI Events
• Consistent data semantics across lots of device and UI platforms.
• Scaling to handle billions of events.
• Near real-time semantic data quality and validation
• Dealing with data loss (low power devices, loss at the network, etc.)
![Page 27: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/27.jpg)
Canaries for data quality
Near real time feedback and validation on data quality.
![Page 28: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/28.jpg)
“Trending” on Netflix
Now being AB tested
![Page 29: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/29.jpg)
Near line systems for Netflix recommendations
Devices
Data Collection
Big Data Analysis(Hadoop, Teradata)
Netflix recommendation:
online services
Pre-computed recommendations
“Explicit” Data Services
“Implicit” Data Services
Post-processat run time
Near line data processing and serving
systems
![Page 30: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/30.jpg)
“Trending on Netflix” near line system
Take rates (play/impression)kafka stream
Cassandra
dashboards
StreamProcessing(ETA: low # of minutes)
Play start(kafka stream)
1000’s / sec
Impressions (kafka stream)
millions / sec
![Page 31: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/31.jpg)
“Trending on Netflix” near line system
Play start(kafka stream)
1000’s / sec
Impressions (kafka stream)
millions / secStream ProcessingWindowed operations.Small batches.Merging streams.Flexibility.
Take rates
Impressions rollup
Personalized Ranked videos
Merged to generate “Trending on Netflix”
![Page 32: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/32.jpg)
Spark Streaming at Netflix
• Collaborating with Databricks to make sure Spark (batch and streaming) works well in a cloud environment
– Resiliency and scalability testing
• Actively working on studying scaling needs for algorithmic needs for both Spark batch and Spark streaming.
![Page 33: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/33.jpg)
Spark at Netflix
• Several different use cases where we are interested in Spark – both batch and streaming.
• Largest Spark batch production cluster is 150 m3.2xl instances for personalization.
• Netflix has both Spark batch and Spark streaming in production.
![Page 34: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/34.jpg)
Spark at Netflix
• Integrating with Spark with Scala (mostly), python, and some SQL.
• Python typically via iPython notebook integration.
• Running in standalone mode or in mesos.
![Page 35: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/35.jpg)
Spark: areas to watch for.
• We have really not tested the multi-tenancy boundaries yet. Mostly spinning custom purpose clusters for now.
• Tuning the jobs and optimizing performance of jobs remains a challenge as we make steady inroads.
• Incrementally getting better with stability and scale as we tackle larger use cases this year.
![Page 36: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/36.jpg)
Netflix Tech Blog
• Tech blog about the “Trending on Netflix” row published today.
• Watch for upcoming tech blog from Netflix on near line systems and another one about Spark in the coming weeks.
![Page 37: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/37.jpg)
Now Hiring leaders and engineers!
Talk to me in person or at
Twitter: @sgkrishnan
LinkedIn:https://www.linkedin.com/pub/gopal-krishnan/0/7a7/905
![Page 38: Nearline systems to improve Netflix recommendations](https://reader034.vdocuments.us/reader034/viewer/2022042508/55aadb061a28abc9598b4717/html5/thumbnails/38.jpg)