com as mão sujas de dados julio faerman 1981269-3 facom techweek 2014

39
Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014 p://jfaerman.com.br/facom14

Upload: antonia-bernardes-carvalhal

Post on 07-Apr-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

Com as Mão Sujas de Dados

Julio Faerman1981269-3

FACOM TechWeek 2014

http://jfaerman.com.br/facom14

Page 2: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

“Na Prática a Teoria é Outra...”

Page 3: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

16 years2000+ employees

40 million user

http://aws.amazon.com/solutions/case-studies/netflix/http://techblog.netflix.com/2013/12/netflix-presentation-videos-from-aws.html

Amazon Web Services for 100%

of Streaming

34.2% of all downstream

during primetime

Page 4: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

AmazonSimpleStorageService

• Durable, scalable and fast storage (99.999999999%)

• 2+ Trillion (1012) objects• 1.1+ Million RPS• Native HTTP/S• And more:

Permissions, Static Hosting, Logging, Versionamento, Archival and Expiration Lifecycle, Torrent, Tags, Redundancy, Requester Pays, Criptography, Reduced Redundancy and more

http://aws.amazon.com/s3/

Page 5: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

“Any dataset that is worth retaining is stored on S3. This includes data from billions of streaming events from televisions, laptops, and mobile devices every hour captured by our log data pipeline, plus dimension data from Cassandra supplied by our Aegisthus pipeline.”http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html

“87% Cost Reduction per Streaming Start.”http://youtu.be/XBgkZxAljbs

“In terms of scale, we have a 10 petabyte data warehouse on S3.”http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html

Page 6: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

StructuredRelationalOn-Line

GB-TB-PB

Semi-structuredMap Reduce

BatchTB-PB-EB

Once upon a time…

Page 7: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

Today

Structured

On-Line

GB

TB

PB

EB

Semi-structured

UnstructuredDistributed Cache

In-Memory Data Grid

Map Reduce

ETLExtract-Transfer-Load

Graph Database

Document Database

Columnar Database

Batch

Real Time

Machine Learning

Relational Database

http://nathanmarz.com/

Data Structure Server

Stream Processing

Rule Engine

NoSQL

Page 8: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

AmazonElastic

MapReduce

• Distributed processing with Apache Hadoop

• Near linear scalability• Resizable and disposable Clusters• Apache Hadoop ecosystem:

Hive, Pig, Impala, Spark, ..., …, …• Instant automatic provisioning• Simplified Administration• 5.5M+ Clusters

http://aws.amazon.com/elasticmapreduce/

Page 9: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

http://aws.amazon.com/solutions/case-studies/pinterest/

50K -> 17M Usuários em 9 Meses

12- Funcionários

48M Usuários

8 Bilhões de Objetos

400+ TB de dados

Page 10: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

April 2013:

400+ Web Engines400+ API Engines70x2+ MySQL DBs100+ Redis Instances230+ Memcache Instances10 Redis Task Manager500 Redis Task Processors80 Sharded Solr20 HBase12 Kafka + Azkabhan8 Zookeeper Instances 12 Varnish

http://www.infoq.com/presentations/scaling-pinterest

Page 11: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

AmazonRelationalDatabaseService

• MySQL, Postgres, Oracle or SQL Server

• Highly Available (Multi-AZ)• Read-Replicas• Automated Backup, Patching and

Scaling

http://aws.amazon.com/rds/

Page 12: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

AmazonElastiCache

• Memcached and Redis• Replication• Backup and Restore• Managed patch management,

failure detection and recovery• Elastic• Reliable

http://aws.amazon.com/elasticache/

Page 13: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

• Petabyte Scale Data Warehousing

• Massively parallel OnLine Analytic Processing

• Resizable without downtime• Managed provisioning and

administration• Compatible with PostgreSQL

AmazonRedshift

http://aws.amazon.com/redshift/

Page 14: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

Amazon Redshift Architecture

Leader Node• SQL endpoint• Stores metadata• Coordinates query execution

Compute Nodes• Local, columnar storage• Execute queries in parallel• Load, backup, restore via

Amazon S3; load from Amazon DynamoDB or SSH

Two hardware platforms• Optimized for data processing• DW1: HDD; scale from 2TB to 1.6PB• DW2: SSD; scale from 160GB to 256TB

10 GigE(HPC)

IngestionBackupRestore

SQL Clients/BI Tools

128GB RAM

16TB disk

16 cores

Amazon S3 / DynamoDB / SSH

JDBC/ODBC

128GB RAM

16TB disk

16 coresCompute Node

128GB RAM

16TB disk

16 coresCompute Node

128GB RAM

16TB disk

16 coresCompute Node

LeaderNode

Page 15: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

ETL from EMR/Hive to Amazon Redshift trough Amazon S3

EMR S3 Redshift

Extract & Transform Load

UnstructuredUnclean

StructuredClean

ColumnarCompressed

Page 16: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

Amazon Redshift at Pinterest Today

• 16 node 256TB cluster • 2TB data per day• 100+ regular users• 500+ queries per day

75% <= 35 seconds, 90% <= 2 minute• Operational effort <= 5 hours/week

Page 17: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

Shazam @ Superbowl

http://www.allthingsdistributed.com/2012/06/amazon-dynamodb-growth.html

Page 18: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014
Page 19: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

Relational Indexvs.

Key-Value

Page 20: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

B? Treevs.

Distributed? Hash Table

Page 21: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

O(log n)vs.

O(1)

Page 22: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

• NoSQL Database• Provisioned Throughput• Seamless Salability• Zero Admin• Single digit millisecond latencyAmazon

DynamoDB

http://aws.amazon.com/dynamodb/

Page 23: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

~5TB em Base de Dados

1 Bilhão de Requests/Mês

67.000 Requests/Minuto

34 milhões de Recomendações/Dia

4 milhões de produtos

27 Milhões de usuário

"A gente não pode se dar ao luxode jogar fora informação"

Page 24: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014
Page 25: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

Availability Zone

Tomcat 6

MySQL

Primórdio

Page 26: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

1a Etapa

Availability Zone

Tomcat 6EhCache

MySQL

Backup

Page 27: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

2a Etapa

Availability Zone

Tomcat 6EhCacheNewRelic

MySQL Primário

Availability Zone

MySQL Secundário

EBS RAID0 EBS RAID0

Replicação

Page 28: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

Availability ZoneAvailability Zone

3a Etapa

Availability Zone

Tomcat 6 + EhCache

Nginx HAProxy

Availability Zone Availability Zone

MySQL 1

EBS RAID0

MySQL 2

EBS RAID0

Replicação

MemcachedElasticLoad

Balancer

Page 29: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

4a Etapa

Auto Scaling group

NginxHAProxyJettyEhCache

Availability Zone

Memcached Availability ZoneAvailability Zone

region region

Page 30: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

Evolução da Arquitetura

Page 31: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

AmazonKinesis

AmazonData

Pipeline

Cenas dos próximos capítulos…

http://aws.amazon.com/datapipeline/ http://aws.amazon.com/kinesis/

Page 32: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

Where to begin?

Page 33: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

http://aws.amazon.com/training/intro_series/

Page 34: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

http://aws.amazon.com/free/

Page 35: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

http://aws.amazon.com/training/

Page 36: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

https://www.youtube.com/user/AmazonWebServices

http://aws.amazon.com/podcasts/aws-podcast/

Page 37: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

http://aws.amazon.com/blogs/aws/

http://awshub.com.br

Page 38: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014
Page 39: Com as Mão Sujas de Dados Julio Faerman 1981269-3 FACOM TechWeek 2014

Julio [email protected]

http://jfaerman.com.br/facom14

Obrigado! Perguntas?