instructure-aws-bigdata -...

Post on 17-Mar-2018

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data on AWSQuick overview and deep’ish dive on Real-­time analytics

Ben SnivelyAWS Specialist SA – Big Data and Analyticssnivelyb@amazon.com

Growing Data, Faster data, from more sources

1.2 ZB in 201544 ZB by 2020 180 ZB by 2025

Data is being generated faster and faster

More and more data sources80 billion devices -­ 2025500 million tweets daily

GB TB

PB

ZB

EB

Source: IDC, 2015

IoT

Social Media

Enterprise Systems

(7 Vs) Visualization

Value

(5 Vs)VeracityVariability

Velocity Volume

Variety

Requirements for Solution

Drivers for Big Data

Big Data was Meant for the Cloud

Big data Cloud ComputingVariety, volume, and velocity requiring new tools Variety of compute, storage, and networking options

Potentially massive datasets

Massive, virtually unlimited capacity

Iterative, experimental style of data manipulation and analysis

Iterative, experimental style of infrastructure deployment/usage

At its most efficient with highly variable workloads

Frequently not steady-­state workload;; peaks and valleys

Absolute performance not as critical as “time to results”;; shared resources are a bottleneck

Parallel compute projects allow each workgroup to have more autonomy, get faster results

GenerateCollect,

Orchestrate, Store

Analyze

Lower CostIncreased Velocity Traditionally -­ Highly constrained

Common Big Data Flow

One tool to rule them all

AWS Big Data Platform

EMR EC2

Glacier

S3

Import Export

Kinesis

Direct Connect

Machine LearningRedshift

DynamoDB

AWS Database Migration Service

Collect Orchestrate Store Analyze

AWS Lambda

AWS IoT

AWS Data Pipeline

Kinesis AnalyticsAmazonSNS

AWS Snowball

AmazonSWF

Amazon Athena

Amazon QuickSight

Amazon AuroraAWS Glue

Real-­time Analytics-­ Amazon Kinesis Platform

Amazon Kinesis: Streaming Data Done the AWS WayMakes it easy to capture, deliver, and process real-­time data streams

Pay as you go, no up-­front costs

Elastically scalable

Right services for your specific use cases

Real-­time latencies

Easy to provision, deploy, and manage

Amazon Kinesis StreamsFor Technical Developers

Build your own custom applications that process or analyze streaming

data

Amazon Kinesis Firehose

For all developers, data scientists

Easily load massive volumes of streaming data into S3,Amazon Redshift and Amazon

Elasticsearch

Amazon Kinesis Analytics

For all developers, data scientists

Easily analyze data streams using standard

SQL queries

Amazon Kinesis: Streaming data made easyServices make it easy to capture, deliver and process streams on AWS

Region Availability (today)Kinesis Streams Kinesis Firehose Kinesis Analytics Lambda

Sending & Reading Data from Kinesis Streams

AWS SDK

LOG4J

Flume

Fluentd

Get* APIs

Kinesis Client Library+

Connector Library

Apache Storm

Amazon Elastic MapReduce

Sending Consuming

AWS Mobile SDK

KinesisProducerLibrary AWS Lambda

Apache Spark

Apache Flink

Amazon Kinesis Streams vs. Amazon Kinesis Firehose

Amazon Kinesis Streams is for use cases that require customprocessing, per incoming record, with sub-­1 second processinglatency, and a choice of stream processing frameworks.

Amazon Kinesis Firehose is for use cases that require zeroadministration, ability to use existing analytics tools based onAmazon S3, Amazon Redshift and Amazon Elasticsearch, and adata latency of 60 seconds or higher.

Demonstration

S3 is the Data Lake

Data Lake Reference Architecture

AthenaGlue

Processing & Analytics

Real-­‐time Batch

AI & Predictive

BI & Data Visualization

Transactional & RDBMS

AWS LambdaApache Storm on EMR

Apache Flink on EMR

Spark Streaming on EMR

Elasticsearch Service

Kinesis Analytics, Kinesis Streams

DynamoDB

NoSQL DB Relational DatabaseAurora

EMRHadoop, Spark,

Presto

RedshiftData Warehouse

AthenaQuery Service

Amazon LexSpeech recognition

Amazon Rekognition

Amazon PollyText to speech

Machine LearningPredictive analytics

Kinesis Streams & Firehose

Amazon S3 Data Lake

Amazon KinesisStreams & Firehose

AWS LambdaApache Storm on

EMR

Apache Flink on EMR

Spark Streaming on EMR

Hadoop / Spark

Streaming Analytics Tools

Amazon RedshiftData Warehouse

Amazon DynamoDBNoSQL DB & Graph DB

Amazon Elasticsearch Service

Relational DatabaseAmazon EMR

Amazon Aurora

Amazon Machine LearningMachine Learning

Open Source Tool of Choice

on EC2

Data Sources

Data Lake Architecture with AWS Tools

Data Science Sandbox

Visualization / Reporting

Amazon Kinesis Analytics

Summary

• AWS enables you to build sophisticated big data applications • Retrospective, Real-­time, Predictive

• You can build incrementally, adding use cases and increasing scale as you go

• AWS provides a broad range of security and auditing features to enable you to meet your security requirements

• AWS makes it easy to build hybrid applications that span across your datacenters and the AWS Cloud

https://aws.amazon.com/big-­data/ also /iot

top related