(arc303) panning for gold: analyzing unstructured data | aws re:invent 2014

35

Upload: amazon-web-services

Post on 14-Jul-2015

5.423 views

Category:

Technology


1 download

TRANSCRIPT

0

5

10

15

20

25

30

35

40

45

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

Glo

bal D

ata

in

Ze

tta

byte

s

Year

1 ZB = 1, 126, 000,000,000,000,000,000 bytes (approximate)

1 ZB = 10 ²¹ bytes = 1024 Exabytes

About 85% is unstructured data

Volume

Variety

Variability

Velocity

90% of the world’s data has

been generated in the last 2

years

Limited View of

Customer

Internal Data

Server Logs

Data Center

Database

Structured

Unstructured

• Customer Profile

• Product Purchase Statistics

• Product Catalog & Inventory

• Surveys & Customer Reviews

• Emails & Support Requests

• Audio & Video Discussions*

Objective: To personalize and improve online user experience

Complete View of

Customer

Internal Data

Server Logs

Data Center

Database

Structured

Unstructured

• Customer Profile

• Product Purchase Statistics

• Product Catalog & Inventory

• Surveys & Customer Reviews

• Emails & Support Requests

• Audio & Video Discussions*

Internal Data

Server Logs

Data Center

Database

Structured

Unstructured

• Customer Profile

• Product Purchase Statistics

• Product Catalog & Inventory

• Surveys & Customer Reviews

• Emails & Support Requests

• Audio & Video Discussions*

External Data

Social

Reports

Structured

Unstructured

• Social & Professional Profile

• Data from External APIs

• User Location Details

• External Panel Data, Webpages

• Blogs, Reviews, Social Activity

• Likes, Connections, Videos

Mobile

PC

Tablet

From: Device/Form Factors

*Across Browsers/Apps

How: Data Collection

APIs

Third Party

Data Providers

Client Data

Social

Chat

What: Data Variants

User Profile

HTML & Images

Location & Time

Surveys & Reviews

Feeds

Social

Chat

Feeds

User Profile

HTML & Images

Location & Time

Reports

Base EC2 Node

Input

Configuration

Amazon SQS

Amazon S3 Input

Launch

Instances

Amazon S3 Code &

Input List

Amazon S3 Output

Send

Job MessagesPull

Job Messages in Parallel

Fabric

Read Input Files

from Amazon S3 in

Parallel

Write Output Files to

Amazon S3 in Parallel

Alarms

Email & Notifications

Process

Logs

HTML

DIV 1 DIV 2 DIV 3

A

A

<LI> <UL> <UL>

A

A

A

A

DIV 4

A

A

<OL> A

IMGFeature 1

Feature 2

Feature 3

Feature 4

Feature 5

Feature 6

Panel & Web Logs

Social

Rules

Engine

Data

Parser

• Tweets

• Comments

• Likes

• Shares

• Blogs

• Reviews

• Clickstream

• HTML

• Images

• Audio*

• Video*

Feature Type Detail

Feature 1 Image 600*400

Feature 2 Link #

Feature 3 Price 200$

Feature 4 Star 3.5

Tweet Time View

Tweet1 12:00 Positive

Tweet 2 12:05 Neutral

AWS technology Use

AWS Identity and Access Management (IAM) Security and access

Amazon CloudWatch Monitoring infrastructure

Auto Scaling Rule-based dynamic scaling

Amazon Simple Email Service (Amazon SES) Notification and emails

Amazon Simple Notification Service (Amazon SNS) Alarms and notification

AWS CloudTrail User activity and change tracking

AWS CloudFormation Deployment templates

AWS Trusted Advisor Cloud optimization

On Demand (66%)

Spot (33%)

D

a

t

a

P

r

o

v

i

d

e

r

Real Time

A

P

I

Social

Chat

Batch

Batch

Reviews

Surveys C

l

i

e

n

t

Amazon

Redshift

Unified

Data

Store

IAMAWS

CloudFormation

AWS

CloudFormation

Operations

Tracking

for

SLA

Amazon

DynamoDB

Amazon

KinesisAuto Scaling

Sentiment Parser Workers

Amazon EC2

Indexers

Scaled-Up

Input Data

Receiver

Amazon

S3Amazon EMR

Amazon

S3

Amazon EC2

Content

Crawlers

Amazon EC2

Lexical Analyzer

Workers

Amazon

S3

Alarms

Email

Notification

Operations

Logs

Amazon S3

Amazon

CloudWatch

Devices

“Platform we have built has given business teams the muscle and insight

that they have never seen before”

“This unique user view has given Product teams an excellent lens into

what drives user behaviour and how they can positively impact it!”

Please give us your feedback on this session.

Complete session evaluations and earn re:Invent swag.

http://bit.ly/awsevals