riga dev day: lambda architecture at aws
TRANSCRIPT
Is Lambda Architecture really a new normal for cloud native apps?
λ+
:~ whoami: Antons Kranga
Full stack developer ~ 15years
Cloud Architect
DevOps evangelist
Innovation Center of Accenture Cloud Platform
Speaker
Marathon runner
Motivation
What is Streaming?We often want to deploy data models based on new data that continuously arrive from the multiple sources
0101
01010101
0101
Challenges
Users expect data will appear immediately after it arrived
Fault tolerant
Distributed data consistency
Scalability (how not to lose data when scale down)
What is “λ”
010 10101 00
01 1101
Speed Layer Batch Layer
new data
master data
realtime view
Serving Layer
view View View…
map-red
query query
realtime view
What is “λ” architectureBatch Layer: Master Data sets and Pre-compute aggregations
• Slow Data Ingestion – minutes to days intervals• Append-only data sets eventually supersedes data
captured in speed layer
Speed Layer: High throughput, near-real-time data ingestion
• Fast Data Ingestion – seconds interval• Concurrent information processing• Retrieval of most recent information
Serving Layer: Provide query capability over the Batch Layer
• Low-latency ad-hoc query• May also provide assess to speed layer views
Why go Cloud Native?
Cloud Provider Lock-In
Avoid “Yak shaving”
Rely on managed services
devops automation
Lower operating costs
Transparent integration with other “Cloud Native” services
AWS Blueprint for Lambda Architectures
https://d0.awsstatic.com/whitepapers/lambda-architecure-on-for-batch-aws.pdfPublished at July 2015
Amazon Kinesis
Amazon Kinesis–enabled app
S3 buckets
Amazon EMR
speed layer
batch layer
emr on serving and merging layer
Data services form AWS
Kinesis
aws region
az1 az2 az3
Lambda
S3 storage
Redshift
consumers
EC2 Instance
EMR
producers
Kinesis
producers
aws region
az1 az2 az3
Lambda
S3 storage
Redshift
consumers
EC2 Instance
EMR
AmazonKinesis kinesis = ......PutRecordRequest putRecord = new PutRecordRequest();putRecord.setStreamName(streamName);putRecord.setData(ByteBuffer.wrap(bytes));putRecord.setSequenceNumberForOrdering(null);...kinesis.putRecord(putRecord);
Producer
Kinesis
aws region
az1 az2 az3
Lambda
S3 storage
Redshift
consumers
EC2 Instance
EMR
AmazonKinesis kinesis = ......PutRecordRequest putRecord = new PutRecordRequest();putRecord.setStreamName(streamName);putRecord.setData(ByteBuffer.wrap(bytes));putRecord.setSequenceNumberForOrdering(null);...kinesis.putRecord(putRecord);
Producer
AmazonKinesisClient kinesisClient = ...GetShardIteratorRequest req = ...req.setStreamName("my-kinesis");req.setShardIteratorType("TRIM_HORIZON");...GetRecordsResult result = kinesisClient.getRecords(req);records = result.getRecords();for (Record record : records) {
... = record.getData();}
Consumerproducers
Kinesis streamsWhat: Enables to build near-real-time data processing applications
Use cases:
• Real time analytics• Log files processing• Reporting
Durability: data streams replicated across 3AZ
Kinesis streamsCost Model:
Shard Hour:• 5 read transaction per second• 2 MB data read per second• 100 write transactions per second• 1 MB data write per second
aprox 12.5USD/Mo
Extended data retention• Up to 7 days
Kinesis streamsNot good when:
• Small scale throughput less than 200KB/sec
• Long term data storage (more than 24H)
LambdaWhat: Lambda allows to write function without having actual server
Use cases:• Real time Stream processing• Tiny ETL• In few cases can replace EC2• Process IaaS Events
Runtimes: Java8, NodeJS, Python
Backed by: provides /tmp for ephemeral storage.
Durability: No maintenance windows, 3 retries before failure
LambdaCost Model:
Requests per function:• GB/seconds• Step 100 millisec • 0.20 USD Mill-Requests; $0.00001667 per GB
LambdaNot good when:
• Timeout 300 sec (cannot be changed)
• Forces developer to think stateless
• Highly dynamic web-sites.
• Competes with t2.nano ($4.75/month)
S3 storage
SNS
consumers
Kinesis
Lambda
…Lambda
S3 storage
SNS
consumers
Kinesis
…
myApp.ZIP
Java8PythonNodeJS
EMRWhat: Managed service of Apache Hadoop
Use cases:• MapRed data processing• Large data ETL jobs• Data movement• Log processing and analytics
Backed by: 1 or cluster of EC2 instances
Durability: on storage level provides by S3
See more:https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf
EMRCost Model:
• Charges apply per EC2 sizes model
• S3 storage charges applies (0.03 GB/Mo)
EMRNot good when:
• Small to Medium data sets
• ACID (atomicity, consistency, isolation, durability)
• Competes with RDS: Dynamo DB, Aurora DB
S3 What: Highly fully managed persistent storage
• Static content web sites
• File storage (primarily for reading)
• Archives storage
Backed by: covered by AWS S3 SLA
Durability: storage: 99.999999999%; availability: 99.99%
S3Cost Model: GB/Mo
• Standard Storage: $0.03 GB/Mo
• Infrequent Access Storage: $0.0125 GB/Mo
• Glacier Storage: $0.007 GB/Mo
S3Not good when:
• S3 write can be slow
• Glacier can restore up to 5% of storage per months
RedshiftWhat: Petabytes scale Data Warehouse as managed service
• Data warehouse (OLAP)
• BI and ETL
• Store large historical data
Backed by: AWS provides automatic data backup
Durability: on storage level provides by S3
Scaling: Start with 160GB node and then you can scale
RedshiftCost Model:
• Charges apply per EC2 sizes model
• S3 storage charges applies (0.03 GB/Mo)
RedshiftNot good when:
• OLTP (On-line transaction processing)
• Unstructured data
• Blob storage
Kinesis
shard
shard
shard
producer
batch layer
speed layerec2
S3 Bucket Map Red
Process Stream
serving layer
View
DynamoDB
Primer Lambda(every hour)
Kinesis
shard
shard
shard
producer
batch layer
speed layerec2
S3 Bucket Map Red
Process Stream
serving layer
View
DynamoDB
Primer Lambda(every hour)
computation per hour
Lambda(every hour)
h0 h1 h2 h3
batch layerSpeed layer
t
Kinesis
shard
shard
shard
producer
batch layer
speed layerfec2
S3 Bucket Map Red
Process Stream
serving layer
View
DynamoDB
Primer Lambda(every hour)
Lambda(every hour)
Presentation Layer
JS appLambda
Lesions learned
It is better but not simple
Not everything is automated
Questions?
Thank you!