workshop part2 – big data

67

Upload: amazon-web-services

Post on 27-Nov-2014

395 views

Category:

Technology


1 download

DESCRIPTION

Webit 2014 AWS workshop

TRANSCRIPT

Page 1: Workshop part2 – Big Data
Page 2: Workshop part2 – Big Data

THE MORE DATA YOU COLLECT THE MORE VALUE YOU CAN

DERIVE FROM IT

Page 3: Workshop part2 – Big Data
Page 4: Workshop part2 – Big Data
Page 5: Workshop part2 – Big Data

THE COST OF DATA GENERATION IS FALLING

Page 6: Workshop part2 – Big Data
Page 7: Workshop part2 – Big Data
Page 8: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Page 9: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Lower cost, higher throughput

Page 10: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Lower cost, higher throughput

Highlyconstrained

Page 11: Workshop part2 – Big Data

+ ELASTIC AND HIGHLY SCALABLE + NO UPFRONT CAPITAL EXPENSE + ONLY PAY FOR WHAT YOU USE + AVAILABLE ON-DEMAND !

= REMOVE CONSTRAINTS

Page 12: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Page 13: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

AWS Import / Export AWS Direct Connect

Page 14: Workshop part2 – Big Data

Inbound data transfer is freeMultipart upload to S3Physical mediaAWS Direct Connect

Page 15: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Amazon S3,Amazon Glacier,

Amazon DynamoDB,Amazon RDS,

Amazon Redshift,AWS Storage Gateway,Data on Amazon EC2

Page 16: Workshop part2 – Big Data

AMAZON S3 SIMPLE STORAGE SERVICE

Page 17: Workshop part2 – Big Data

CASE STUDY:SPOTIFY ADDS 20,000 TRACKS/DAY TO ITS CATALOGUE

Page 18: Workshop part2 – Big Data

AMAZON DYNAMODB"

HIGH-PERFORMANCE, FULLY MANAGED NoSQL DATABASE SERVICE

Page 19: Workshop part2 – Big Data

DURABLE & AVAILABLECONSISTENT, DISK-ONLY

WRITES (SSD)

Page 20: Workshop part2 – Big Data

LOW LATENCYAVERAGE READS < 5MS,

WRITES < 10MS

Page 21: Workshop part2 – Big Data

!

!

!

NO ADMINISTRATION

Page 22: Workshop part2 – Big Data

CASE STUDY:SHAZAM SUPPORTED 500,000 WRITES/SECDURING SUPER BOWL

Page 23: Workshop part2 – Big Data

AMAZON REDSHIFT"

FULLY MANAGED, PETA-BYTE SCALE DATAWAREHOUSE ON AWS

Page 24: Workshop part2 – Big Data
Page 25: Workshop part2 – Big Data
Page 26: Workshop part2 – Big Data

30 MINUTES DOWN TO

12 SECONDS

Page 27: Workshop part2 – Big Data
Page 28: Workshop part2 – Big Data

Extra Large Node (HS1.XL) !

Single Node (2 TB)

!

Cluster 2-32 Nodes (4 TB – 64 TB)

AMAZON REDSHIFT LETS YOU START SMALL AND GROW BIG

Eight Extra Large Node (HS1.8XL)Cluster 2-100 Nodes (32 TB – 1.6 PB)

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

Page 29: Workshop part2 – Big Data

JDBC/ODBC  !

!

Page 30: Workshop part2 – Big Data
Page 31: Workshop part2 – Big Data
Page 32: Workshop part2 – Big Data
Page 33: Workshop part2 – Big Data
Page 34: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHAREAmazon EC2

Amazon Elastic MapReduce

Page 35: Workshop part2 – Big Data

AMAZON EC2 ELASTIC COMPUTE CLOUD

Page 36: Workshop part2 – Big Data
Page 37: Workshop part2 – Big Data
Page 38: Workshop part2 – Big Data
Page 39: Workshop part2 – Big Data
Page 40: Workshop part2 – Big Data
Page 41: Workshop part2 – Big Data

3 HOURSFOR $4828.85/hr

Page 42: Workshop part2 – Big Data

Instead of $20+ MILLIONS in infrastructure

Page 43: Workshop part2 – Big Data

GPU INSTANCES"!

G2"CG1 

1x NVIDIA Kepler GK104 8 vCPU (Intel Xeon E5-2670)

2x NVIDIA Fermi M2050 16 vCPU (Intel Xeon X5570)

$0.65/h

$2.10/h

Page 44: Workshop part2 – Big Data

ON A SINGLE INSTANCE

COMPUTE TIME: 4hCOST: 4h x $2.1 = $8.4

Page 45: Workshop part2 – Big Data

ON MULTIPLE INSTANCES

COMPUTE TIME: 1hCOST: 1h x 4 x $2.1 = $8.4

Page 46: Workshop part2 – Big Data
Page 47: Workshop part2 – Big Data
Page 48: Workshop part2 – Big Data
Page 49: Workshop part2 – Big Data
Page 50: Workshop part2 – Big Data

AMAZON ELASTIC MAPREDUCE

HADOOP AS A SERVICE

Page 51: Workshop part2 – Big Data

CASE STUDY:"WITH AMAZON EMR WE CAN ANALYZE 100% OF THE DATA,NOT JUST A SAMPLE" - Sanjeevan Bala, Head of Data Planning & Analytics, Channel 4

Page 52: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Amazon S3,Amazon DynamoDB,

Amazon RDS,Amazon Redshift,

Data on Amazon EC2

Page 53: Workshop part2 – Big Data

PUBLIC DATA SETShttp://aws.amazon.com/publicdatasets

Page 54: Workshop part2 – Big Data
Page 55: Workshop part2 – Big Data
Page 56: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Page 57: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

BATCHPROCESSING

Page 58: Workshop part2 – Big Data

GENERATE ➔ ➔ SHARESTREAM

PROCESSING

Page 59: Workshop part2 – Big Data

AMAZON KINESISREAL-TIME DATA STREAM PROCESSING

Page 60: Workshop part2 – Big Data

Hourly server logs: how your systems went wrong an hour ago

Weekly / Monthly Bill: What you spent this past billing cycle

Daily customer report from your website: tells you what deal or ad to try next time

Daily fraud reports: tells you if there was fraud yesterday

Daily business reports: tells me how customers used AWS services yesterday

Real-time metrics: what just went wrong now

Real-time spending alerts/caps: guaranteeing you can’t overspend

Real-time analysis: what to offer the current customer now

Real-time detection: blocks fraudulent use now

Fast ETL into Amazon Redshift: how are customers using services now

Page 61: Workshop part2 – Big Data
Page 62: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Page 63: Workshop part2 – Big Data

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Amazon S3,Amazon DynamoDB,

Amazon RDS,Amazon Redshift,

Data on Amazon EC2

Amazon EC2 Amazon Elastic

MapReduce

Amazon S3,Amazon Glacier,

Amazon DynamoDB,Amazon RDS,

Amazon Redshift,AWS Storage Gateway,Data on Amazon EC2

AWS Import / Export AWS Direct Connect

Page 64: Workshop part2 – Big Data

GENERATE ➔ ➔ SHARESTREAM

PROCESSING

Page 65: Workshop part2 – Big Data

GENERATE ➔ ➔ SHARESTREAM

PROCESSING

Amazon S3,Amazon DynamoDB,

Amazon RDS,Amazon Redshift,

Data on Amazon EC2

Amazon KinesisStream Processing

on Amazon EC2

Page 66: Workshop part2 – Big Data

FROM DATA TOACTIONABLE INFORMATION

Page 67: Workshop part2 – Big Data