(spot305) event-driven computing on change logs in aws | aws re:invent 2014
DESCRIPTION
An increasingly common form of computing is computation in response to recently occurring events. These might be newly arrived or changed data, such as an uploaded Amazon S3 image file or an update to an Amazon DynamoDB table, or they might be changes in the state of some system or service, such as termination of an EC2 instance. Support for this form of computing requires both a means of efficiently surfacing events as a sequence of change records, as well as frameworks for processing such change logs. This session provides an overview of how AWS intends to facilitate event-driven computing through support for both change logs as well as various means of processing them.TRANSCRIPT
© 2014 Amazon Web Services, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services, Inc.
November 13, 2014 | Las Vegas
Event-Driven Computing @ AWS
Marvin Theimer & Khawaja Shams
It’s All About Timeliness
and Agility
Traditional Way of Doing Things
• My phone or my camera uploads an image file to my S3 bucket
– Maybe my phone is smart enough to index my photos by their GPS coordinates
• Now I buy an app that creates photo albums by city, region, and country
– Don’t want to run it on my phone!
– Could be a web service – but now I’m handing all my personal photos to a 3rd
party
– Could be a cloud app that I periodically invoke – but now I have to remember to invoke it and remember which photos to run it on
– Could set up a recurring work flow to run it – but now I’m paying to run it all the time
• Now I buy another app that does face recognition and tags all my friends
• Not I buy another app that does ____
• …
Traditional Way of Doing Things
• Launch an EC2 instance as part of your application
• Your application is highly available
– Did you set up all the appropriate alarms on the new instance?
• Your application is secure
– Did you tell your intrustion detection service about the new instance?
• Your company is cost conscious
– Did you tag the instance with the appropriate cost allocation tags?
Event-Driven Way of Doing Things
• Your phone or camera uploads an image to your S3 bucket
• An event is generated telling all interested parties about the upload
• Your photo gets indexed by GPS location
• Your photo gets added to all relevant photo albums
• Your photo gets tagged with friend references
• Ideally: You didn’t have to do anything other than purchase all those cool apps
Event-Driven Way of Doing Things
• An EC2 instance is launched
• An event is generated telling all interested parties about the creation of the new instance
• Appropriate monitors and alarms are created
• Intrusion detection learns of the new instance
• The appropriate cost allocation tags are added
• Ideally:
– You didn’t have to do anything other than launch the EC2 instance
– Application developers don’t need to know about all the “auxiliary” activities that have to happen
How Do You Discover
New Events?
Anti-Pattern in Discovering New Events
Periodically Scan Entire Dataset
List S3-buckets
List S3-buckets
Diff (ListingA – ListingB)
{millions of objects}
{millions of objects + 3 objects}
3 objects
Anti-Pattern in Discovering New Events
Periodically polling all system state
Ec2-describe-instances
Ec2-describe-instances
Diff (ListingA– LintingB) 3 instances
{thousands of instances}
{thousands of instances+ 3 instances}
Event Logs
reduce the problem back to
traditional computing
Cloudtrail event log for API calls
Event Driven Computing in AWS Up Till Now
Customer 1
Customer 2
Customer 3
S3
Event Driven Computing in AWS Today
S3 event notifications
SQS
Event Driven Computing in AWS Today
DynamoDB Streams
Event Driven Computing in AWS Tomorrow
Event logs for asynchronous service events
Event logs from other data storage services
Customer 1
Customer 2
Customer 3
Vision: Unified Event Log Approach
Kinesis
S Q S
DynamoDB Streams
S3 Archive Objects
S3 Archive Objects
S N S
Challenge:
A Unified Event Log Approach
KinesisSQS
Plus: easy conversion to
other standard forms:
S3 archive format, SNS, …
(Unordered Events) (Ordered Events)
Benefits of Unification
Layers of Commonality -- Storage
Sequence of bytes
Layers of Commonality -- Storage
File System:- Everyone can have their
own sequence of bytes
- Tools for managing and
manipulating byte
sequences
Layers of Commonality -- Storage
Typed files:- Application-specific stat
e
- Tools for managing and
manipulating structured
information across many
files; e.g. search
Layers of Commonality – Event Logs
Sequence of un-interpreted records
Layers of Commonality – Event Logs
Event logs:- Everyone can have their own
sequence of records
- Tools for managing and
manipulating sequences of
records
Layers of Commonality – Event Logs
Typed event logs:- DynamoDB update streams
- Tools for managing and
manipulating structured
information across many
files; e.g. cross-region replication
Reusable Processing Infrastructure
Challenge: Cloud Scale
Challenge: High Availability
X
Free pool
Challenge: Elastic / Highly variable workloads
Free pool
Standard “Big Data”
processing framework
that automates most
of the “muck”
Foundations
Lowest level of abstraction: un-interpreted sequence of records
A key characteristic:
vs.
(e.g. multi-item transactions
or “delta” updates)
(e.g. S3 image
upload notifications)7 6 5 4 3 2
unorderedordered
Unordered Log Processing Using SQS
SQS queue X
ASG
Ordered Log Processing
16 5 4 3 2
?
K
I
N
E
S
I
S
Ordered Log Processing Using the
Kinesis Client Library
Shard mgmt
table
User
State
Kinesis-enabled application
ASG
Use of the KCL
Mostly writing
business logic
Kinesis vs. DynamoDB Update Streams
• The Kinesis API and the DynamoDB Update Streams API differ
– Different max record sizes
– DynamoDB controls all aspects of writing to streams
• this includes naming of streams, provisioning, sharding, and resharding
– ListStream and DescribeStream in DynamoDB include service-specific semantics (e.g. Describe
returns the table schema)
• Kinesis Client Library (KCL) abstracts these differences away
– Best way to write applications for either Kinesis streams or DynamoDB update streams
– Applications that are agnostic to which type of stream is being processed can transparently target
either type
Higher-Level Processing Frameworks
• SQS and Kinesis-enabled applications are low-level frameworks:
– You still need to create AMIs, launch EC2 instances, configure
auto-scaling groups, etc.
– “All I want is X . Can’t someone just create that for me?”
• Lambda eliminates the operations/management tasks
• Opportunity: High level capabilities – e.g. archive-to-S3, upload-to-
Redshift, or publish-to-SNS – can be provided as predefined functions
that can be attached to an event log
Example: Cloud Mashups
Example: Cross Region Replication
How Many Event Logs?
Good for customer understanding of a particular s
ervice:
- What just happened?
- List everything that happened recently
Not so easy to understand things across multiple
services
Too expensive for “data plane” events;
wrong granularity:
- Log per S3 bucket
- Log per DynamoDB table
Event log per customer per service
Cust. 2593
Cust. 2593
Cust. 2593
Cust. 7302
Cust. 7302
Cust. 3826
Cust. 8941
Cust. 2590
Cust. 4198
Cust. 8368
Cust. 2505
Cust. 7731
How Many Event Logs?
Per customer event log of all control
plane events- Traffic volume small enough to simply merge all
of it
- Makes it easy understand the bigger picture
Cust. 2593
Cust. 7302
Cust. 3826
Optionally generated per “entity”
event log for data storage services- The right granularity
- Only incur traffic costs where necessary
Bucket Y
Bucket W
Table A
Table B
Table C
Bucket X
Bucket Z
Event Logs for Customers’ Services
Vision: customers’ services and applications leverage the AWS event
log infrastructure
Cust. 2593
Cust. 7302
Cust. 3826
Widget A
Widget B
Widget C
www.widget.com
Per-customer control plane events sent
to per-customer unified control plane log
Create & manage optional per-entity
data plane event logs (e.g. as Kinesis
streams)
Summary:
It’s All About Timeliness and Agility
• “Cycle time compression may be the most underestimated force in determing
winners & losers in tech” – Marc Andreesen
• Real-time events lets you create real-time applications
• Published events let 3rd parties independently innovate on top of each other
• Platform-wide event architecture lets independent parties start building
composable tools and functions
• Low friction processing frameworks (e.g. Lambda) compress the development &
operations cycle time
Summary:
Enablers for Pervasive Event-Driven Computing
• Efficient way of surfacing events: event logs
• Standards for discovery, access, semi-structured data formats, and processing of
event logs
• Low and high-level processing frameworks that enable various degrees of control
vs. simplicity
Summary:
AWS Offerings
• Pre-existing:
– Cloudtrail
– SQS and KCL-enabled processing frameworks
• Newly-introduced:
– S3 event notifications
– DynamoDB Streams
– Lambda Cloud Functions
Opportunity for an Ecosystem
Enablers:
– Enumerate/discover event logs
– Standard, semi-structured data formats
– Standard processing frameworks – e.g.
SQS, KCL, Lambda
Opportunity for an Ecosystem
Marketplace for free /for-fee software & services
– KCL-enabled libraries, Lambda functions,
etc.
– Services that consume/emit streams of
records – e.g. SQS or Kinesis records
Please give us your feedback on this
presentation
© 2014 Amazon Web Services, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services, Inc.
Join the conversation on Twitter with
#reinvent
SPOT305
For further details attend the
following deep dive sessions
about S3 event notifications
and DynamoDB streams:
- S3 event notifications: SDD413 (@2:15)
- DynamoDB streams: SDD424 (@5:30)