aws webinar - dynamo db + redshift 13_09_19
DESCRIPTION
Learn how Digital Advertising customers are leveraging the integration between Amazon DynamoDB and Amazon Redshift to manage their high scale data, from creation to analysis. In this session, we will describe the three essential ingredients of efficient data flow in the cloud, and introduce a reference architecture that enables customers to meet the demands for low latency and high volume encountered in the Digital Advertising industry. Using existing SQL-based tools and business intelligence systems, you will learn how to gain deeper insight from your data at lower cost. The design principles presented here will be useful to every environment where managing data at scale is a challenge.TRANSCRIPT
![Page 1: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/1.jpg)
Designing for Scale Three steps to optimal data performance
using DynamoDB and Redshift
David Pearson Business Development
![Page 2: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/2.jpg)
Amazon RDS
Amazon DynamoDB Amazon Redshift
Amazon ElastiCache
Compute Storage
AWS Global Infrastructure
Database
Application Services
Deployment & Administration
Networking
AWS Database
Services
Scalable High Performance
Application Storage in the Cloud
![Page 3: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/3.jpg)
provision
manage
scale
EFFORT
differentiated?
![Page 4: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/4.jpg)
Introduction to AWS Big Data Services
Redshift DynamoDB
Elastic MapReduce Amazon S3
Object Storage
Batch Processing
Real-Time Transactions
Online Analysis and Reporting
![Page 5: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/5.jpg)
Amazon DynamoDB
![Page 6: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/6.jpg)
NoSQL Database
Predictable performance
Seamless & massive scalability
Fully managed; zero admin
Amazon DynamoDB
![Page 7: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/7.jpg)
Amazon’s Path to DynamoDB
RDBMS DynamoDB
![Page 8: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/8.jpg)
Amazon DynamoDB
DEVS
OPS
USERS
![Page 9: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/9.jpg)
Fast Application Development
Time to Build New Applications
• Flexible data models • Simple API • High-scale queries • Laptop development
Amazon DynamoDB
DEVS
OPS
USERS
![Page 10: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/10.jpg)
Amazon DynamoDB
DEVS
OPS
USERS
Admin-Free (at any scale)
![Page 11: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/11.jpg)
request-based capacity provisioning model
Provisioned Throughput
Throughput is declared and updated via the API or the console
CreateTable (foo, reads/sec = 100, writes/sec = 150)
UpdateTable (foo, reads/sec=10000, writes/sec=4500)
DynamoDB handles the rest
Capacity is reserved and available when needed
Scaling-up triggers repartitioning and reallocation
No impact to performance or availability
![Page 12: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/12.jpg)
Amazon DynamoDB
DEVS
OPS
USERS Durable Low Latency
![Page 13: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/13.jpg)
WRITES Replicated continuously to 3 AZ’s
Persisted to disk (custom SSD)
READS Strongly or eventually consistent
No latency trade-off
![Page 14: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/14.jpg)
Latest News… DynamoDB Local
• Disconnected development
• Full API support
• Download from http://aws.amazon.com/dynamodb/resources/#testing
![Page 15: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/15.jpg)
“Compared to similar products, DynamoDB
provides an amazing feature set, including super
low latencies, (literally) push-button scaling,
automatic data persistence, and seamless
integration with Redshift and other AWS services.”
Peter Bogunovich, RightAction Inc
![Page 16: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/16.jpg)
AD SERVING
![Page 17: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/17.jpg)
EC2
Profiles Database
ad request
ad url
visitor
Ad Servers
DynamoDB
1. Visitor loads a web page
2. Web page issues a request to ad servers on EC2
3. Query to DynamoDB returns the ad to display
4. Link is returned to visitor
cookie hash=userid range=timestamp
user-profile hash=userid
![Page 18: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/18.jpg)
EC2
Profiles Database Ad Servers
DynamoDB
Real-time bidding platform
Bidder DynamoDB
Ads Profiles Queues and Buffer Bid response
20 ms
20 ms 20 ms 40 ms
Request network transit
Response network transit Decision on best ad and bid price based on optimization that needs multiple data look-ups
Contingency time buffer
…
Bid request
real-time bidding
![Page 19: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/19.jpg)
EC2
Profiles Database
ad request
ad url
visitor
Ad Servers
DynamoDB
1. Ad files are downloaded from CloudFront
2. Impressions captured in logs to S3
CloudFront
advertisement
impression logs
Static Repository Files
Amazon S3
![Page 20: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/20.jpg)
CloudFront
advertisement
impression logs
Static Repository Files
Amazon S3
Profiles Database
EC2 (MAZ)
ad request
ad url
Ad Servers
DynamoDB Elastic Load Balancing
visitor
Click-through Servers
click through log files
click through requests
Elastic Load Balancing
![Page 21: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/21.jpg)
Amazon Redshift
![Page 22: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/22.jpg)
Relational data warehouse
Massively parallel
Petabyte scale
Fully managed; zero admin
Amazon Redshift
![Page 23: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/23.jpg)
• Direct-attached storage
• Large data block sizes
• Columnar storage
• Data compression
• Zone maps
Redshift dramatically reduces I/O
Id Age State 123 20 CA 345 25 WA 678 40 FL
Row storage Column storage
![Page 24: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/24.jpg)
• Load
• Query
• Resize
• Backup
• Restore
Redshift parallelizes and distributes everything
Compute Node 16TB
10 GigE (HPC)
Ingestion Backup Restore
SQL Clients / BI Tools
Amazon S3
Client VPC
Compute Node 16TB
Compute Node 16TB
Leader Node
![Page 25: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/25.jpg)
Start small and grow big Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
note: nodes not to scale
Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
![Page 26: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/26.jpg)
Monitor query performance
![Page 27: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/27.jpg)
View explain plans
![Page 28: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/28.jpg)
Redshift works with existing BI tools
JDBC/ODBC
Amazon Redshift
More coming soon…
![Page 29: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/29.jpg)
Redshift is Priced to Analyze All Your Data
$0.85 per hour for on-demand (2TB)
$999 per TB per year (3-yr reservation)
![Page 30: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/30.jpg)
“Amazon Redshift introduces a major
opportunity to improve the performance of
our real-time reporting, allowing us to run
queries up to 50 times faster than our current
OLAP solution.” – Niek Sanders, VP Engineering
Realized a 20x – 40x
reduction in query times
“Redshift is the
real deal”
![Page 31: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/31.jpg)
Analysis
![Page 32: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/32.jpg)
CloudFront
advertisement
impression logs
Static Repository Files
Amazon S3
Profiles Database
EC2 (MAZ)
ad request
ad url
Ad Servers
DynamoDB Elastic Load Balancing
visitor
Amazon Redshift
bid history user history
ETL Click-through Servers
click through log files
click through requests
Elastic Load Balancing
Amazon EMR
updated profiles
impressions
new requests user history
![Page 33: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/33.jpg)
Amazon Redshift
Drive qualified users to advertiser’s sites
• Ad server logs • 3rd party data
• Bid history • User history
Bid Optimization
Optimizing with Redshift
Optimize return on advertising expenditure
• Impressions • 3rd party data
• User history
• Enrichment
Cost Optimization
![Page 34: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/34.jpg)
1. Describe the full lifecycle of data Identify data consumption patterns, expected data volumes and
SLAs (latency, availability, durability) at each point on the timeline
2. Leverage specialized options
DynamoDB – real-time transaction processing
Redshift – online reporting and analysis
EMR – enrichment
S3 – data staging
Three steps to optimal data performance
![Page 35: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/35.jpg)
3. Optimize access patterns Design database schemas for maximum efficiency
DynamoDB
» minimize payloads
» separate hot data from cold
Redshift
» good distribution and sort key selection – test as needed
» efficient ingestion (from DynamoDB and S3)
Three steps to optimal data performance
![Page 36: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/36.jpg)
DynamoDB • Best Practices, How-Tos, and Tools
• http://aws.amazon.com/dynamodb/resources/
• Download DynamoDB Local • http://aws.amazon.com/dynamodb/resources/#testing
Redshift • Best practices for loading data
• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html
• Best practices for designing tables • http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-
practices.html
Resources
![Page 37: AWS Webinar - Dynamo DB + Redshift 13_09_19](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c638884a7959c9388b4666/html5/thumbnails/37.jpg)
Questions