big data analytics on aws
Post on 22-Jan-2018
768 Views
Preview:
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dickson Yue, Solutions Architect
17 June 2016
Big Data Analytics on AWS Digital Innovation & e-Commerce Track
How to get started?
Data Answers
START HERE WITH A BUSINESS CASE
Revenue Lift
Market acquisition
Product recommendation
Improve user experience
Operation intelligence
Data Answers
Time to Answer (Latency) Throughput
Cost
Ingest/ Collect
Consume/ visualize Store Process/
analyze
1 4 0 9
5
Data Answers Ingest/ Collect
Consume/ visualize Store Process/
analyze
1 4 0 9
5
Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon RDS
Amazon EMR
Amazon Redshift
Amazon Machine Learning
Storage Processing Visualize
ElasticSearch service
QuickSight
ElastiCache
Tracking Clickstream, user retention
Answer • User retention • High spending customer
navigation pattern • Product recommendation • User journey in the shop • UX improvement • What deal/ad to try
next
Use case
Data source • Page • Click event • Web log • Thing event
JavaScript (Snowplow)
AWS SDK
logstach
Fluentd
Ingest Store
@ 30km/s a.k.a 300 rps
HTTP Post
Amazon S3
Storage
@ 100km/s Ingest Store
JavaScript (Snowplow)
AWS SDK
LOG4J
Flume
Fluentd
HTTP Post
Amazon Kinesis
Firehose
API Server Streaming Buffer
24hrs-7days
Web Servers
Amazon S3
Storage Data lake
@ 100km/s Ingest Store
JavaScript (Snowplow)
AWS SDK
LOG4J
Flume
Fluentd
HTTP Post
Amazon S3
Amazon Kinesis
Firehose
API Gateway
API Server Streaming Buffer
24hrs-7days
Storage Data lake
Amazon S3
Storage Data lake
Store Process/Analyze
EMR
Redshift
Redshift EMR ETL
Visualize
JDBC ODBC
JDBC ODBC
QuickSight
Amazon S3
Store Process
EMR
Visualize
JDBC ODBC
Redshift Basket
CRM ERP DBs
Log file
QuickSight
Day-14 retention over time
User retention and growth
N-day retention
Social listening Social CRM, Chatbot
Answer Campaign performance Customer service automation Building Chatbot
Use case
Data Brand page activity Post #hashtag User profile
Logstash
AWS SDK
Ingest Store
Bot AWS SDK
App
Crawlers AWS SDK
Amazon Kinesis
Firehose
Store
Amazon S3 Data Lake
ElasticSearch Last 120mins
Analysts
AWS SDK
Why do we need machine learning for this?
The social media stream is high-volume, and most of the messages are not CS-actionable
Logstash
AWS SDK
Ingest Store
Bot AWS SDK
App
Crawlers AWS SDK
Amazon Kinesis
Process
Amazon Lambda
Analyze
AWS SDK
Machine learning
Notification
Action
Support issue
Database
Feature request
Keep training the ML model with new data
Action
Amazon S3
AWS SDK
Ingest Store
Bot AWS SDK
Messenger
Amazon Kinesis
Process
Amazon Lambda
Analysts
Machine learning
Action
Bot
App
Get prediction
Keep training the ML model with new data Amazon S3
OI from Business view with custom source
Refrigerator
POS
Door sensor
Water
Camera
Storefront
Kitchen
Lambda
SQS
AWS IoT
SQSPoller
Http Event Collector
Serverless Architecture
Our Big Data Scale
Total ~25 PB DW on Amazon S3 Read ~10% DW daily Write ~10% of read data daily ~ 550 billion events daily ~ 350 active platform users
predict what you want to watch before you watch it.
Netflix Prize - best collaborative filtering algorithm
Storage Compute Service Tools
Big Data Portal
API Portal
Big Data API
AWS S3
Data Answers Ingest/ Collect
Consume/ visualize Store Process/
analyze
1 4 0 9
5
START WITH A BUSINESS CASE
MATCH AVAILABLE DATA
CHOOSE BEST FIT
Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon RDS
Amazon EMR
Amazon Redshift
Amazon Machine Learning
Storage Processing Visualize
ElasticSearch service
QuickSight
ElastiCache
Source DBs
3rd Party Data
Log Data
Reporting
Analysis
Processing
Data Lake
S3
Source of truth
Remember to complete your evaluations!
Thank you
CRM ERP DBs
Log file
AWStats
days
MB
2002 Big bang
<2005 Hello world
Page/Event tracking
GA
hours
GB
SOLOMO
minutes - hours
TB
<2008 New customer service
New System monitoring New QA
IoT
O2O
seconds – hours PB
2016 Fast and big
data driven marketing
Analytics
ETL
Interactive data exploration
Interactive slice & dice
RT analytics & iterative/ML algo and more ...
Different Big Data Processing Needs
top related