(spot305) event-driven computing on change logs in aws | aws re:invent 2014

Post on 30-Jun-2015

389 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

An increasingly common form of computing is computation in response to recently occurring events. These might be newly arrived or changed data, such as an uploaded Amazon S3 image file or an update to an Amazon DynamoDB table, or they might be changes in the state of some system or service, such as termination of an EC2 instance. Support for this form of computing requires both a means of efficiently surfacing events as a sequence of change records, as well as frameworks for processing such change logs. This session provides an overview of how AWS intends to facilitate event-driven computing through support for both change logs as well as various means of processing them.

TRANSCRIPT

© 2014 Amazon Web Services, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services, Inc.

November 13, 2014 | Las Vegas

Event-Driven Computing @ AWS

Marvin Theimer & Khawaja Shams

It’s All About Timeliness

and Agility

Traditional Way of Doing Things

• My phone or my camera uploads an image file to my S3 bucket

– Maybe my phone is smart enough to index my photos by their GPS coordinates

• Now I buy an app that creates photo albums by city, region, and country

– Don’t want to run it on my phone!

– Could be a web service – but now I’m handing all my personal photos to a 3rd

party

– Could be a cloud app that I periodically invoke – but now I have to remember to invoke it and remember which photos to run it on

– Could set up a recurring work flow to run it – but now I’m paying to run it all the time

• Now I buy another app that does face recognition and tags all my friends

• Not I buy another app that does ____

• …

Traditional Way of Doing Things

• Launch an EC2 instance as part of your application

• Your application is highly available

– Did you set up all the appropriate alarms on the new instance?

• Your application is secure

– Did you tell your intrustion detection service about the new instance?

• Your company is cost conscious

– Did you tag the instance with the appropriate cost allocation tags?

Event-Driven Way of Doing Things

• Your phone or camera uploads an image to your S3 bucket

• An event is generated telling all interested parties about the upload

• Your photo gets indexed by GPS location

• Your photo gets added to all relevant photo albums

• Your photo gets tagged with friend references

• Ideally: You didn’t have to do anything other than purchase all those cool apps

Event-Driven Way of Doing Things

• An EC2 instance is launched

• An event is generated telling all interested parties about the creation of the new instance

• Appropriate monitors and alarms are created

• Intrusion detection learns of the new instance

• The appropriate cost allocation tags are added

• Ideally:

– You didn’t have to do anything other than launch the EC2 instance

– Application developers don’t need to know about all the “auxiliary” activities that have to happen

How Do You Discover

New Events?

Anti-Pattern in Discovering New Events

Periodically Scan Entire Dataset

List S3-buckets

List S3-buckets

Diff (ListingA – ListingB)

{millions of objects}

{millions of objects + 3 objects}

3 objects

Anti-Pattern in Discovering New Events

Periodically polling all system state

Ec2-describe-instances

Ec2-describe-instances

Diff (ListingA– LintingB) 3 instances

{thousands of instances}

{thousands of instances+ 3 instances}

Event Logs

reduce the problem back to

traditional computing

Cloudtrail event log for API calls

Event Driven Computing in AWS Up Till Now

Customer 1

Customer 2

Customer 3

S3

Event Driven Computing in AWS Today

S3 event notifications

SQS

Event Driven Computing in AWS Today

DynamoDB Streams

Event Driven Computing in AWS Tomorrow

Event logs for asynchronous service events

Event logs from other data storage services

Customer 1

Customer 2

Customer 3

Vision: Unified Event Log Approach

Kinesis

S Q S

DynamoDB Streams

S3 Archive Objects

S3 Archive Objects

S N S

Challenge:

A Unified Event Log Approach

KinesisSQS

Plus: easy conversion to

other standard forms:

S3 archive format, SNS, …

(Unordered Events) (Ordered Events)

Benefits of Unification

Layers of Commonality -- Storage

Sequence of bytes

Layers of Commonality -- Storage

File System:- Everyone can have their

own sequence of bytes

- Tools for managing and

manipulating byte

sequences

Layers of Commonality -- Storage

Typed files:- Application-specific stat

e

- Tools for managing and

manipulating structured

information across many

files; e.g. search

Layers of Commonality – Event Logs

Sequence of un-interpreted records

Layers of Commonality – Event Logs

Event logs:- Everyone can have their own

sequence of records

- Tools for managing and

manipulating sequences of

records

Layers of Commonality – Event Logs

Typed event logs:- DynamoDB update streams

- Tools for managing and

manipulating structured

information across many

files; e.g. cross-region replication

Reusable Processing Infrastructure

Challenge: Cloud Scale

Challenge: High Availability

X

Free pool

Challenge: Elastic / Highly variable workloads

Free pool

Standard “Big Data”

processing framework

that automates most

of the “muck”

Foundations

Lowest level of abstraction: un-interpreted sequence of records

A key characteristic:

vs.

(e.g. multi-item transactions

or “delta” updates)

(e.g. S3 image

upload notifications)7 6 5 4 3 2

unorderedordered

Unordered Log Processing Using SQS

SQS queue X

ASG

Ordered Log Processing

16 5 4 3 2

?

K

I

N

E

S

I

S

Ordered Log Processing Using the

Kinesis Client Library

Shard mgmt

table

User

State

Kinesis-enabled application

ASG

Use of the KCL

Mostly writing

business logic

Kinesis vs. DynamoDB Update Streams

• The Kinesis API and the DynamoDB Update Streams API differ

– Different max record sizes

– DynamoDB controls all aspects of writing to streams

• this includes naming of streams, provisioning, sharding, and resharding

– ListStream and DescribeStream in DynamoDB include service-specific semantics (e.g. Describe

returns the table schema)

• Kinesis Client Library (KCL) abstracts these differences away

– Best way to write applications for either Kinesis streams or DynamoDB update streams

– Applications that are agnostic to which type of stream is being processed can transparently target

either type

Higher-Level Processing Frameworks

• SQS and Kinesis-enabled applications are low-level frameworks:

– You still need to create AMIs, launch EC2 instances, configure

auto-scaling groups, etc.

– “All I want is X . Can’t someone just create that for me?”

• Lambda eliminates the operations/management tasks

• Opportunity: High level capabilities – e.g. archive-to-S3, upload-to-

Redshift, or publish-to-SNS – can be provided as predefined functions

that can be attached to an event log

Example: Cloud Mashups

Example: Cross Region Replication

How Many Event Logs?

Good for customer understanding of a particular s

ervice:

- What just happened?

- List everything that happened recently

Not so easy to understand things across multiple

services

Too expensive for “data plane” events;

wrong granularity:

- Log per S3 bucket

- Log per DynamoDB table

Event log per customer per service

Cust. 2593

Cust. 2593

Cust. 2593

Cust. 7302

Cust. 7302

Cust. 3826

Cust. 8941

Cust. 2590

Cust. 4198

Cust. 8368

Cust. 2505

Cust. 7731

How Many Event Logs?

Per customer event log of all control

plane events- Traffic volume small enough to simply merge all

of it

- Makes it easy understand the bigger picture

Cust. 2593

Cust. 7302

Cust. 3826

Optionally generated per “entity”

event log for data storage services- The right granularity

- Only incur traffic costs where necessary

Bucket Y

Bucket W

Table A

Table B

Table C

Bucket X

Bucket Z

Event Logs for Customers’ Services

Vision: customers’ services and applications leverage the AWS event

log infrastructure

Cust. 2593

Cust. 7302

Cust. 3826

Widget A

Widget B

Widget C

www.widget.com

Per-customer control plane events sent

to per-customer unified control plane log

Create & manage optional per-entity

data plane event logs (e.g. as Kinesis

streams)

Summary:

It’s All About Timeliness and Agility

• “Cycle time compression may be the most underestimated force in determing

winners & losers in tech” – Marc Andreesen

• Real-time events lets you create real-time applications

• Published events let 3rd parties independently innovate on top of each other

• Platform-wide event architecture lets independent parties start building

composable tools and functions

• Low friction processing frameworks (e.g. Lambda) compress the development &

operations cycle time

Summary:

Enablers for Pervasive Event-Driven Computing

• Efficient way of surfacing events: event logs

• Standards for discovery, access, semi-structured data formats, and processing of

event logs

• Low and high-level processing frameworks that enable various degrees of control

vs. simplicity

Summary:

AWS Offerings

• Pre-existing:

– Cloudtrail

– SQS and KCL-enabled processing frameworks

• Newly-introduced:

– S3 event notifications

– DynamoDB Streams

– Lambda Cloud Functions

Opportunity for an Ecosystem

Enablers:

– Enumerate/discover event logs

– Standard, semi-structured data formats

– Standard processing frameworks – e.g.

SQS, KCL, Lambda

Opportunity for an Ecosystem

Marketplace for free /for-fee software & services

– KCL-enabled libraries, Lambda functions,

etc.

– Services that consume/emit streams of

records – e.g. SQS or Kinesis records

Please give us your feedback on this

presentation

© 2014 Amazon Web Services, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services, Inc.

Join the conversation on Twitter with

#reinvent

SPOT305

For further details attend the

following deep dive sessions

about S3 event notifications

and DynamoDB streams:

- S3 event notifications: SDD413 (@2:15)

- DynamoDB streams: SDD424 (@5:30)

top related