real-time data processing with amazon dynamodb streams and aws lambda
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Presenter: Vyom Nagrani, Sr. Product Manager, AWS LambdaQ&A Moderator: Ajay Nair, Sr. Product Manager, AWS Lambda
July 30th, 2015
Best Practices: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda
Amazon DynamoDB Streams – time-ordered sequence of item-level changes• Time and partition ordered log
• Provides a stream of inserts, deletes, updates• Old item• New item• Primary key• Change type
• Stream items delivered exactly once
• Streams are asynchronous
• Scales with your table
DynamoDB DynamoDB Streams
Benefits of DynamoDB Streams for real-time data processing
Durability & high availability• High throughput consensus protocol• Replicated across multiple AZs
Managed streams• Simply enable streaming
Performance• Designed for sub-second latency
Native integration with AWS Lambda• DynamoDB Triggers invoke a Lambda
function to run your custom code
DynamoDB DynamoDB Streams
DynamoDB Triggers
Lambda function
Run custom code
AWS Lambda: A compute service that runs your code in response to events
Lambda functions: Stateless, trigger-based code execution
Triggered by events:• Direct Sync and Async invocations • Put to an Amazon S3 bucket• Table update on Amazon DynamoDB• And many more …
Makes it easy to• Build back-end services that perform at scale • Perform data-driven auditing, analysis, and notification
High performance at any scale; Cost-effective and efficient
No Infrastructure to manage
Pay only for what you use: Lambda automatically matches capacity to
your request rate. Purchase compute in 100ms increments.
Bring Your Own Code
“Productivity focused compute platform to build powerful, dynamic, modular applications in the cloud”
Run code in a choice of standard languages. Use threads, processes,
files, and shell scripts normally.
Focus on business logic, not infrastructure. You upload code; AWS
Lambda handles everything else.
Benefits of AWS Lambda for building a server-less data processing engine
1 2 3
DynamoDB Streams + Lambda = Database Triggers
Run multiple real time applications in parallel• DynamoDB Streams natively supports Cross Region Replication• Triggers enables Filtering, Monitoring, Auditing, Notifications, Aggregation, etc.
• No charge for reads/polls that your AWS Lambda function makes to the DynamoDB
Stream associated with the table
Walkthrough of a simple stream logging application workflow
Streams
Amazon DynamoDB
AWS Lambda Amazon CloudWatch Logs
New table updates
Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console
Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console
Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console
Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console
Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console
Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console
Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console
Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console
Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console
Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console
Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console
Today’s demo: Workflow of cross-region replication and real-time data auditing
Original Table Data Stream
Amazon DynamoDB
AWS Lambda
Amazon DynamoDB
Amazon SNS
Audit notification
Cross region
replication
Loop through event array
Replicate item to different table
Send notification if suspicious record
In both cases, wait for callbacks before exiting
Demo: Cross region replication and real-time data auditing using Amazon DynamoDB and AWS Lambda
Attaching Lambda functions to DynamoDB Streams
• Automatic Shards: One Lambda function concurrently invoked per DynamoDB shard
• Each individual shard follows ordered processing
• A given key will be present in at most one concurrently active shard
• All changes (insert, remove, modify) available for a rolling 24-hour basis
… …Source
DynamoDB Streams
Destination 1
Lambda
Destination 2
Pollers FunctionsShards
Lambda will scale automaticallyDynamoDB Streams scales by grouping records into shards
Attaching Lambda functions to DynamoDB Streams
• Reading the stream: Stream is exposed via the familiar
Amazon Kinesis Client Library interface
• Read the stream using https://github.com/awslabs/dynamodb-streams-kinesis-adapter
• Records can be retrieved at ~2x rate of the table’s provisioned write capacity
• Automatic Scaling: Both Dynamo DB and Lambda scale automatically with PUT rates
• Default limit of 100 concurrent Lambda functions, can be increased by AWS Support Center
Performance tuning DynamoDB as an event source
• Batch size: Max records that AWS
Lambda will retrieve from DynamoDB at
the time of invoking your function
• Increasing batch size will cause fewer
Lambda function invocations with more
data processed per function
• Starting Position: The position in the
stream where Lambda starts reading
• Set to “Trim Horizon” for starting with
oldest record
• Set to “Latest” for starting with most
recent data
Best practices for creating Lambda functions
• Memory: CPU proportional to the memory configured
• Increasing memory makes your code execute faster (if CPU bound)
• Timeout: Increasing timeout allows for longer functions, but more wait in case of errors
• Retries: For DynamoDB Streams, Lambda has unlimited retries (until data expires)
• Permission model: Lambda pulls data from DynamoDB, so no resource policy needed,
only execution role to allow Lambda access to DynamoDB
Monitoring and Debugging Lambda functions
• Console Dashboard• Lists all Lambda functions• Easy editing of resources,
event sources and other settings
• At-a-glance metrics
• Metrics in CloudWatch• Requests• Errors• Latency• Throttles
• Logging in CloudWatch Logs
Three Next Steps
1. Enable DynamoDB Streams for your existing DynamoDB tables. DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table in the last 24 hours.
2. Create and test your first Lambda function. With AWS Lambda, there are no new languages, tools, or frameworks to learn. You can use any third party library, even native ones.
3. Use AWS Lambda with DynamoDB Streams to create DynamoDB Triggers … no infrastructure to manage, and setup a clean and lightweight implementation of database triggers, NoSQL style!
Thank you!
Visit http://aws.amazon.com/dynamodb, the AWS blog, and the DynamoDB forum to learn more and get started using DynamoDB.
Visit http://aws.amazon.com/lambda, the AWS Compute blog, and the Lambda forum to learn more and get started using Lambda.