data science at netflix with amazon emr (bdt306) | aws re:invent 2013

Tags:

Post on 30-Oct-2014

1.251 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & Engineering (DSE) is no exception. In this talk, we dive into the Netflix DSE architecture: what and why. Key topics include their use of Big Data technologies (Cassandra, Hadoop, Pig + Python, and Hive); their Amazon S3 central data hub; their multiple persistent Amazon EMR clusters; how they benefit from AWS elasticity; their data science-as-a-service approach, how they made a hybrid AWS/data center setup work well, their open-source Hadoop-related software, and more.

TRANSCRIPT

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Data Science at Netflix with Amazon EMR

Kurt Brown, Director of Data Platform, Netflix

November 13, 2013

Data Platform

S3

Suro

Data Platform

Aegisthus

S3

Suro

Aegisthus

Data Platform

Sting

S3

Suro

Aegisthus

Sting

Data Platform

S3

Suro

Aegisthus

Sting

Data Platform

S3

Suro

Aegisthus

Sting

Data Platform

S3

S3

99.999999999%

S3

S3

High SLA

Query

HDFS ?

Eventual Consistency

S3mper

“Data as a Service” • Execution Service • Event Service • Metadata Service

High SLA Cluster Job

High SLA S3 Query Cluster Job

Query

High SLA S3 Query Cluster Job

Query

High SLA Cluster Job

High SLA S3 Query Cluster Job

Query

High SLA Cluster Job

Bonus

S3 Query Cluster Job

Bonus Cluster Job

High SLA

Query

High SLA Cluster Job

S3 Query Cluster Job

Bonus Cluster Job

High SLA

Query

Tez

Suro

Aegisthus

Questions?

http://jobs.netflix.com kurtbrown@netflix.com

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

BDT306

top related