(spot305) event-driven computing on change logs in aws | aws re:invent 2014

46
© 2014 Amazon Web Services, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services, Inc. November 13, 2014 | Las Vegas Event-Driven Computing @ AWS Marvin Theimer & Khawaja Shams

Upload: amazon-web-services

Post on 30-Jun-2015

389 views

Category:

Technology


1 download

DESCRIPTION

An increasingly common form of computing is computation in response to recently occurring events. These might be newly arrived or changed data, such as an uploaded Amazon S3 image file or an update to an Amazon DynamoDB table, or they might be changes in the state of some system or service, such as termination of an EC2 instance. Support for this form of computing requires both a means of efficiently surfacing events as a sequence of change records, as well as frameworks for processing such change logs. This session provides an overview of how AWS intends to facilitate event-driven computing through support for both change logs as well as various means of processing them.

TRANSCRIPT

Page 1: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

© 2014 Amazon Web Services, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services, Inc.

November 13, 2014 | Las Vegas

Event-Driven Computing @ AWS

Marvin Theimer & Khawaja Shams

Page 2: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

It’s All About Timeliness

and Agility

Page 3: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Traditional Way of Doing Things

• My phone or my camera uploads an image file to my S3 bucket

– Maybe my phone is smart enough to index my photos by their GPS coordinates

• Now I buy an app that creates photo albums by city, region, and country

– Don’t want to run it on my phone!

– Could be a web service – but now I’m handing all my personal photos to a 3rd

party

– Could be a cloud app that I periodically invoke – but now I have to remember to invoke it and remember which photos to run it on

– Could set up a recurring work flow to run it – but now I’m paying to run it all the time

• Now I buy another app that does face recognition and tags all my friends

• Not I buy another app that does ____

• …

Page 4: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Traditional Way of Doing Things

• Launch an EC2 instance as part of your application

• Your application is highly available

– Did you set up all the appropriate alarms on the new instance?

• Your application is secure

– Did you tell your intrustion detection service about the new instance?

• Your company is cost conscious

– Did you tag the instance with the appropriate cost allocation tags?

Page 5: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Event-Driven Way of Doing Things

• Your phone or camera uploads an image to your S3 bucket

• An event is generated telling all interested parties about the upload

• Your photo gets indexed by GPS location

• Your photo gets added to all relevant photo albums

• Your photo gets tagged with friend references

• Ideally: You didn’t have to do anything other than purchase all those cool apps

Page 6: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Event-Driven Way of Doing Things

• An EC2 instance is launched

• An event is generated telling all interested parties about the creation of the new instance

• Appropriate monitors and alarms are created

• Intrusion detection learns of the new instance

• The appropriate cost allocation tags are added

• Ideally:

– You didn’t have to do anything other than launch the EC2 instance

– Application developers don’t need to know about all the “auxiliary” activities that have to happen

Page 7: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

How Do You Discover

New Events?

Page 8: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Anti-Pattern in Discovering New Events

Periodically Scan Entire Dataset

List S3-buckets

List S3-buckets

Diff (ListingA – ListingB)

{millions of objects}

{millions of objects + 3 objects}

3 objects

Page 9: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Anti-Pattern in Discovering New Events

Periodically polling all system state

Ec2-describe-instances

Ec2-describe-instances

Diff (ListingA– LintingB) 3 instances

{thousands of instances}

{thousands of instances+ 3 instances}

Page 10: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Event Logs

reduce the problem back to

traditional computing

Page 11: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Cloudtrail event log for API calls

Event Driven Computing in AWS Up Till Now

Customer 1

Customer 2

Customer 3

S3

Page 12: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Event Driven Computing in AWS Today

S3 event notifications

SQS

Page 13: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Event Driven Computing in AWS Today

DynamoDB Streams

Page 14: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Event Driven Computing in AWS Tomorrow

Event logs for asynchronous service events

Event logs from other data storage services

Customer 1

Customer 2

Customer 3

Page 15: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Vision: Unified Event Log Approach

Kinesis

S Q S

DynamoDB Streams

S3 Archive Objects

S3 Archive Objects

S N S

Challenge:

Page 16: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

A Unified Event Log Approach

KinesisSQS

Plus: easy conversion to

other standard forms:

S3 archive format, SNS, …

(Unordered Events) (Ordered Events)

Page 17: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Benefits of Unification

Page 18: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Layers of Commonality -- Storage

Sequence of bytes

Page 19: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Layers of Commonality -- Storage

File System:- Everyone can have their

own sequence of bytes

- Tools for managing and

manipulating byte

sequences

Page 20: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Layers of Commonality -- Storage

Typed files:- Application-specific stat

e

- Tools for managing and

manipulating structured

information across many

files; e.g. search

Page 21: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Layers of Commonality – Event Logs

Sequence of un-interpreted records

Page 22: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Layers of Commonality – Event Logs

Event logs:- Everyone can have their own

sequence of records

- Tools for managing and

manipulating sequences of

records

Page 23: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Layers of Commonality – Event Logs

Typed event logs:- DynamoDB update streams

- Tools for managing and

manipulating structured

information across many

files; e.g. cross-region replication

Page 24: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Reusable Processing Infrastructure

Page 25: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Challenge: Cloud Scale

Page 26: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Challenge: High Availability

X

Free pool

Page 27: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Challenge: Elastic / Highly variable workloads

Free pool

Page 28: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Standard “Big Data”

processing framework

that automates most

of the “muck”

Page 29: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Foundations

Lowest level of abstraction: un-interpreted sequence of records

A key characteristic:

vs.

(e.g. multi-item transactions

or “delta” updates)

(e.g. S3 image

upload notifications)7 6 5 4 3 2

unorderedordered

Page 30: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Unordered Log Processing Using SQS

SQS queue X

ASG

Page 31: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Ordered Log Processing

16 5 4 3 2

?

Page 32: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

K

I

N

E

S

I

S

Ordered Log Processing Using the

Kinesis Client Library

Shard mgmt

table

User

State

Kinesis-enabled application

ASG

Use of the KCL

Mostly writing

business logic

Page 33: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Kinesis vs. DynamoDB Update Streams

• The Kinesis API and the DynamoDB Update Streams API differ

– Different max record sizes

– DynamoDB controls all aspects of writing to streams

• this includes naming of streams, provisioning, sharding, and resharding

– ListStream and DescribeStream in DynamoDB include service-specific semantics (e.g. Describe

returns the table schema)

• Kinesis Client Library (KCL) abstracts these differences away

– Best way to write applications for either Kinesis streams or DynamoDB update streams

– Applications that are agnostic to which type of stream is being processed can transparently target

either type

Page 34: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Higher-Level Processing Frameworks

• SQS and Kinesis-enabled applications are low-level frameworks:

– You still need to create AMIs, launch EC2 instances, configure

auto-scaling groups, etc.

– “All I want is X . Can’t someone just create that for me?”

• Lambda eliminates the operations/management tasks

• Opportunity: High level capabilities – e.g. archive-to-S3, upload-to-

Redshift, or publish-to-SNS – can be provided as predefined functions

that can be attached to an event log

Page 35: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Example: Cloud Mashups

Page 36: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
Page 37: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Example: Cross Region Replication

Page 38: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

How Many Event Logs?

Good for customer understanding of a particular s

ervice:

- What just happened?

- List everything that happened recently

Not so easy to understand things across multiple

services

Too expensive for “data plane” events;

wrong granularity:

- Log per S3 bucket

- Log per DynamoDB table

Event log per customer per service

Cust. 2593

Cust. 2593

Cust. 2593

Cust. 7302

Cust. 7302

Cust. 3826

Cust. 8941

Cust. 2590

Cust. 4198

Cust. 8368

Cust. 2505

Cust. 7731

Page 39: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

How Many Event Logs?

Per customer event log of all control

plane events- Traffic volume small enough to simply merge all

of it

- Makes it easy understand the bigger picture

Cust. 2593

Cust. 7302

Cust. 3826

Optionally generated per “entity”

event log for data storage services- The right granularity

- Only incur traffic costs where necessary

Bucket Y

Bucket W

Table A

Table B

Table C

Bucket X

Bucket Z

Page 40: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Event Logs for Customers’ Services

Vision: customers’ services and applications leverage the AWS event

log infrastructure

Cust. 2593

Cust. 7302

Cust. 3826

Widget A

Widget B

Widget C

www.widget.com

Per-customer control plane events sent

to per-customer unified control plane log

Create & manage optional per-entity

data plane event logs (e.g. as Kinesis

streams)

Page 41: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Summary:

It’s All About Timeliness and Agility

• “Cycle time compression may be the most underestimated force in determing

winners & losers in tech” – Marc Andreesen

• Real-time events lets you create real-time applications

• Published events let 3rd parties independently innovate on top of each other

• Platform-wide event architecture lets independent parties start building

composable tools and functions

• Low friction processing frameworks (e.g. Lambda) compress the development &

operations cycle time

Page 42: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Summary:

Enablers for Pervasive Event-Driven Computing

• Efficient way of surfacing events: event logs

• Standards for discovery, access, semi-structured data formats, and processing of

event logs

• Low and high-level processing frameworks that enable various degrees of control

vs. simplicity

Page 43: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Summary:

AWS Offerings

• Pre-existing:

– Cloudtrail

– SQS and KCL-enabled processing frameworks

• Newly-introduced:

– S3 event notifications

– DynamoDB Streams

– Lambda Cloud Functions

Page 44: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Opportunity for an Ecosystem

Enablers:

– Enumerate/discover event logs

– Standard, semi-structured data formats

– Standard processing frameworks – e.g.

SQS, KCL, Lambda

Page 45: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Opportunity for an Ecosystem

Marketplace for free /for-fee software & services

– KCL-enabled libraries, Lambda functions,

etc.

– Services that consume/emit streams of

records – e.g. SQS or Kinesis records

Page 46: (SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014

Please give us your feedback on this

presentation

© 2014 Amazon Web Services, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services, Inc.

Join the conversation on Twitter with

#reinvent

SPOT305

For further details attend the

following deep dive sessions

about S3 event notifications

and DynamoDB streams:

- S3 event notifications: SDD413 (@2:15)

- DynamoDB streams: SDD424 (@5:30)