structured, unstructured and streaming big data on the aws

Post on 15-Apr-2017

2.029 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Markku Lepistö

Principal Technology Evangelist, APAC

Structured, Unstructured and Streaming Big Data

on Amazon Web Services

Agenda

1:00pm - 2:00pm Registration – Lunch & Meet AWS SAs 2:00pm - 2:20pm Welcome & Introduction 2:20pm - 3:40pm Structured, unstructured and streaming Big Data on the AWS Platform 3:40pm - 4:00pm Break 4:00pm - 5:15pm Building an Amazon RedShift Data warehouse 5:15pm - 5:30pm Q&A 5.30pm Close

Big Data End to End Framework

Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Apache Storm

PIG

Amazon Machine Learning

Amazon EMR

Amazon Glacier

Amazon DynamoDB

”I  got  kicked  out  of  the  bookshop  last  week,    because  I  moved  all  of  the  Big  Data  books    

into  the  Religion  sec<on”                                                                                                                      

Ingest Store Process Analyse Data Answers

Simplify Big Data Processing

Databases

Database Flat Files Database

Data

File Data

IoT Device

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Database

INGEST STORE

Databases

Database Flat Files Database

Data

File Data

IoT Device

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Database

INGEST

Amazon Redshift

Amazon RDS

STORE

Data Tier

Search Cache Object Store

RDBMS NoSQL Data Warehouse

logging analy)cs

webscale transac)ons

rich  search hot  reads complex  queries and  transac)ons

Data Tier

Amazon DynamoDB

Amazon RDS

Amazon ElastiCache

Amazon S3

Amazon Redshift

Amazon CloudSearch

Traditional Relational Database

Amazon

Redshift Amazon

RDS

Scaling Vertical Horizontal

Storage Row Column

Workload Transactional Analytical

Architecture SMP MPP

Type SQL Relational SQL Relational

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Storage

INGEST

Amazon Redshift

Amazon RDS

Application

Amazon S3

STORE

Impala PIG

Amazon EMR

Amazon S3

Amazon Redshift

Amazon EMR

Glacier

Amazon

DynamoDB

Amazon Machine Learning

Applications

Amazon

Redshift

Scaling Add nodes Automatic

Speed Fastest Fast

Cost Higher Lower

Durability Configurable Built-in

Amazon S3

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Stream Processor

INGEST

Amazon Redshift

Amazon RDS

Amazon S3

Amazon Kinesis

STORE

Why Stream Storage?

Sensors Amazon Kinesis

Apache Kafka

Availability Zone

Availability Zone

Availability Zone

 Data  Sources  

 Data  Sources  

Data  Sources  

 Data  Sources  

 Data  Sources  

Logging

Metrics

Analysis

Processing  

S3

DynamoDB

Redshift

Lambda Amazon Kinesis

Stream

Amazon

Redshift

Ordering Yes Yes

Persistence 24 Hours Configurable

Size 50 KB Configurable

Scaling High High

Latency Low Low

Managed Yes No

Amazon Kinesis

”The  world  of  gaming  never  sleeps.    We  owe  every  player  a  great  experience,    and  AWS  is  our  main  tool  to  make  that  happen.”                                                                                                              -­‐    Sami  Yliharju,  Services  Lead    

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Amazon EMR

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

Hadoop

Amazon

Redshift

Scaling 2 PB+ Nodes

Storage Native HDFS/S3

BI Tools High Medium

Durability High High

Latency Low Low

Managed Fully Semi (EMR)

Amazon

Redshift

Nodes

HDFS

Medium

High

Low

Semi (EMR)

Amazon Redshift Impala

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

PIG

Stre

amin

g

Amazon EMR

Hadoop

PIG

SQL on Hadoop

Eats anything

New Processing Engine

Amplab Big Data Benchmark https://amplab.cs.berkeley.edu/benchmark/

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

PIG

Amazon EMR

Hadoop

AWS Lambda

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

PIG

ANALYSE

Amazon Machine Learning

L

Amazon EMR

Hadoop

AWS Lambda

Use Cases

FOMO                                                                                                                      

Amazon EMR

Hadoop

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Flat Files Database

Data

Event Data

Streaming Data

Databases Amazon Redshift

Amazon Redshift

Database Data

SQL Analytics

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Am

azon

Ela

stic

Map

Red

uce

Flat Files Database

Data

Event Data

Streaming Data

Clickstream Analysis - Batch

Am

azon

Ela

stic

Map

Red

uce

Event Data

Amazon EMR

Hadoop

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Am

azon

Ela

stic

Map

Red

uce

Flat Files Database

Data

Event Data

Streaming Data

Clickstream Analysis – Near Real Time

Event Producer

Amazon Kinesis

Amazon S3

Amazon Redshift

Kinesis Consumers Streaming

Data

Demo

Realtime Twitter analytics using AWS Kinesis, Lambda and Open Source Software

vs

Amazon Kinesis

Twitter Stream AWS Lambda

Demo: Live Twitter Feed Analysis

* https://blog.twitter.com/2013/new-tweets-per-second-record-and-how

Twitter - On a typical day: More than 500 million Tweets sent* •  Average 5,700 TPS

Thank You!

top related