instructure-aws-bigdata -...
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data on AWSQuick overview and deep’ish dive on Real-time analytics
Ben SnivelyAWS Specialist SA – Big Data and [email protected]
Growing Data, Faster data, from more sources
1.2 ZB in 201544 ZB by 2020 180 ZB by 2025
Data is being generated faster and faster
More and more data sources80 billion devices - 2025500 million tweets daily
GB TB
PB
ZB
EB
Source: IDC, 2015
IoT
Social Media
Enterprise Systems
(7 Vs) Visualization
Value
(5 Vs)VeracityVariability
Velocity Volume
Variety
Requirements for Solution
Drivers for Big Data
Big Data was Meant for the Cloud
Big data Cloud ComputingVariety, volume, and velocity requiring new tools Variety of compute, storage, and networking options
Potentially massive datasets
Massive, virtually unlimited capacity
Iterative, experimental style of data manipulation and analysis
Iterative, experimental style of infrastructure deployment/usage
At its most efficient with highly variable workloads
Frequently not steady-state workload;; peaks and valleys
Absolute performance not as critical as “time to results”;; shared resources are a bottleneck
Parallel compute projects allow each workgroup to have more autonomy, get faster results
GenerateCollect,
Orchestrate, Store
Analyze
Lower CostIncreased Velocity Traditionally - Highly constrained
Common Big Data Flow
One tool to rule them all
AWS Big Data Platform
EMR EC2
Glacier
S3
Import Export
Kinesis
Direct Connect
Machine LearningRedshift
DynamoDB
AWS Database Migration Service
Collect Orchestrate Store Analyze
AWS Lambda
AWS IoT
AWS Data Pipeline
Kinesis AnalyticsAmazonSNS
AWS Snowball
AmazonSWF
Amazon Athena
Amazon QuickSight
Amazon AuroraAWS Glue
Real-time Analytics- Amazon Kinesis Platform
Amazon Kinesis: Streaming Data Done the AWS WayMakes it easy to capture, deliver, and process real-time data streams
Pay as you go, no up-front costs
Elastically scalable
Right services for your specific use cases
Real-time latencies
Easy to provision, deploy, and manage
Amazon Kinesis StreamsFor Technical Developers
Build your own custom applications that process or analyze streaming
data
Amazon Kinesis Firehose
For all developers, data scientists
Easily load massive volumes of streaming data into S3,Amazon Redshift and Amazon
Elasticsearch
Amazon Kinesis Analytics
For all developers, data scientists
Easily analyze data streams using standard
SQL queries
Amazon Kinesis: Streaming data made easyServices make it easy to capture, deliver and process streams on AWS
Region Availability (today)Kinesis Streams Kinesis Firehose Kinesis Analytics Lambda
Sending & Reading Data from Kinesis Streams
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Kinesis Client Library+
Connector Library
Apache Storm
Amazon Elastic MapReduce
Sending Consuming
AWS Mobile SDK
KinesisProducerLibrary AWS Lambda
Apache Spark
Apache Flink
Amazon Kinesis Streams vs. Amazon Kinesis Firehose
Amazon Kinesis Streams is for use cases that require customprocessing, per incoming record, with sub-1 second processinglatency, and a choice of stream processing frameworks.
Amazon Kinesis Firehose is for use cases that require zeroadministration, ability to use existing analytics tools based onAmazon S3, Amazon Redshift and Amazon Elasticsearch, and adata latency of 60 seconds or higher.
Demonstration
S3 is the Data Lake
Data Lake Reference Architecture
AthenaGlue
Processing & Analytics
Real-‐time Batch
AI & Predictive
BI & Data Visualization
Transactional & RDBMS
AWS LambdaApache Storm on EMR
Apache Flink on EMR
Spark Streaming on EMR
Elasticsearch Service
Kinesis Analytics, Kinesis Streams
DynamoDB
NoSQL DB Relational DatabaseAurora
EMRHadoop, Spark,
Presto
RedshiftData Warehouse
AthenaQuery Service
Amazon LexSpeech recognition
Amazon Rekognition
Amazon PollyText to speech
Machine LearningPredictive analytics
Kinesis Streams & Firehose
Amazon S3 Data Lake
Amazon KinesisStreams & Firehose
AWS LambdaApache Storm on
EMR
Apache Flink on EMR
Spark Streaming on EMR
Hadoop / Spark
Streaming Analytics Tools
Amazon RedshiftData Warehouse
Amazon DynamoDBNoSQL DB & Graph DB
Amazon Elasticsearch Service
Relational DatabaseAmazon EMR
Amazon Aurora
Amazon Machine LearningMachine Learning
Open Source Tool of Choice
on EC2
Data Sources
Data Lake Architecture with AWS Tools
Data Science Sandbox
Visualization / Reporting
Amazon Kinesis Analytics
Summary
• AWS enables you to build sophisticated big data applications • Retrospective, Real-time, Predictive
• You can build incrementally, adding use cases and increasing scale as you go
• AWS provides a broad range of security and auditing features to enable you to meet your security requirements
• AWS makes it easy to build hybrid applications that span across your datacenters and the AWS Cloud
https://aws.amazon.com/big-data/ also /iot