Download - Roger Barga - Processing Big Data in Motion
![Page 1: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/1.jpg)
Processing Big Data in MotionStreaming Data Ingestion and Processing
Roger Barga, General Manager, Kinesis Streaming Services, AWS
April 7, 2016
![Page 2: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/2.jpg)
Riding the Streaming Rapids
2011 20152007 & 2008 2013201220102009 2016
Azure Stream Analytics
Complex Event Processingover Streaming Data
Relational Semanticsand Implementation
Streaming Map Reduce& Machine Learning over Streams
![Page 3: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/3.jpg)
Interest in and demand for
stream data processing is rapidly
increasing*…* Understatement of the year…
![Page 4: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/4.jpg)
Most data is produced continuously
Why?
{
"payerId": "Joe",
"productCode": "AmazonS3",
"clientProductCode": "AmazonS3",
"usageType": "Bandwidth",
"operation": "PUT",
"value": "22490",
"timestamp": "1216674828"
}
Metering Record
127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif
HTTP/1.0" 200 2326
Common Log Entry
<165>1 2003-10-11T22:14:15.003Z
mymachine.example.com evntslog - ID47
[exampleSDID@32473 iut="3" eventSource="Application"
eventID="1011"][examplePriority@32473 class="high"]
Syslog Entry
“SeattlePublicWater/Kinesis/123/Realtime” –
412309129140
MQTT Record<R,AMZN,T,G,R1>
NASDAQ OMX Record
![Page 5: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/5.jpg)
Time is money• Perishable Insights (Forrester)
Why?
• Hourly server logs: how your systems were misbehaving an hour ago
• Weekly / Monthly Bill: What you spent this past billing
cycle?
• Daily fraud reports: tells you if there was fraud yesterday
• CloudWatch metrics: what just went wrong now
• Real-time spending alerts/caps: guaranteeing you
can’t overspend
• Real-time detection: blocks fraudulent use now
![Page 6: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/6.jpg)
Time is money• Perishable Insights (Forrester)
• A more efficient implementation
• Most ‘Big Data’ deployments process
continuously generated data (batched)
Why?
![Page 7: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/7.jpg)
Availability Variety of stream data processing systems,
active ecosystem but still early days…
Why?
![Page 8: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/8.jpg)
Disruptive Foundational for business critical workflows
Enable new class of applications & services
that process data continuously.
Why?
![Page 9: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/9.jpg)
Need to begin thinking about applications &
services in terms of streams of data and
continuous processing.
You
A change in perspective is worth 80 IQ points… – Alan Kay
![Page 10: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/10.jpg)
• Scalable & Durable Data Ingest A quick word on our motivation
Kinesis Streams, through a simple example
• Continuous Stream Data Processing Kinesis Client Library (KCL)
One select design challenge: dynamic resharding
How customers are using Kinesis Streams today
• Building on Kinesis Streams Kinesis Firehose
AWS Event Driven Computing
Agenda
![Page 11: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/11.jpg)
Our Motivation for Continuous Processing
AWS Metering service• 100s of millions of billing records per second
• Terabytes++ per hour
• Hundreds of thousands of sources
• For each customer: gather all metering records & compute monthly bill
• Auditors guarantee 100% accuracy at months end
Seem perfectly reasonable to run as a batch, but relentless pressure for realtime…
With a Data Warehouse to load• 1000s extract-transform-load (ETL) jobs every day
• Hundreds of thousands of files per load cycle
• Thousands of daily users, hundreds of queries per hour
![Page 12: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/12.jpg)
Our Motivation for Continuous Processing
AWS Metering service• 100s of millions of billing records per second
• Terabytes++ per hour
• Hundreds of thousands of sources
• For each customer: gather all metering records & compute monthly bill
• Auditors guarantee 100% accuracy at months end
Other Service Teams, Similar Requirements• CloudWatch Logs and CloudWatch Metrics
• CloudFront API logging
• ‘Snitch’ internal datacenter hardware metrics
![Page 13: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/13.jpg)
Real-time Ingest
• Highly Scalable
• Durable
• Replayable Reads
Continuous Processing
• Support multiple simultaneous
data processing applications
• Load-balancing incoming
streams, scale out processing
• Fault-tolerance, Checkpoint /
Replay
Right Tool for the Job Enable Streaming Data Ingestion and Processing
![Page 14: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/14.jpg)
twitter-trends.com
Elastic Beanstalk
twitter-trends.com
Example applicationtwitter-trends.com website
![Page 15: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/15.jpg)
twitter-trends.com
Too big to handle on one box
![Page 16: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/16.jpg)
twitter-trends.com
The solution: streaming map/reduce
My top-10
My top-10
My top-10
Global top-10
![Page 17: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/17.jpg)
twitter-trends.com
Core concepts
My top-10
My top-10
My top-10
Global top-10
Data recordStream
Partition key
ShardWorker
Shard: 14 17 18 21 23
Data record
Sequence number
![Page 18: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/18.jpg)
twitter-trends.com
How this relates to Kinesis
KinesisKinesis application
![Page 19: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/19.jpg)
Kinesis Streaming Data Ingestion
• Streams are made of Shards
• Each Shard ingests data up to
1MB/sec, and up to 1000 TPS
• Producers use a PUT call to store
data in a Stream: PutRecord {Data,
PartitionKey, StreamName}
• Each Shard emits up to 2 MB/sec
• All data is stored for 24 hours, 7
days if extended retention is ‘ON’
• Scale Kinesis streams by adding
or removing Shards
• Replay data from retention period
![Page 20: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/20.jpg)
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates dataacross three data centers (availability zones)
Aggregate andarchive to S3
Millions ofsources producing100s of terabytes
per hour
FrontEnd
AuthenticationAuthorization
Ordered streamof events supportsmultiple readers
Real-timedashboardsand alarms
Machine learningalgorithms or
sliding windowanalytics
Aggregate analysisin Hadoop or adata warehouse
Inexpensive: $0.028 per million puts
Real-Time Streaming Data Ingestion
Custom-built
Streaming
Applications
(KCL)
Inexpensive: $0.014 per 1,000,000 PUT Payload Units
25 – 40ms 100 – 150ms
![Page 21: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/21.jpg)
Kinesis Client Library
![Page 22: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/22.jpg)
twitter-trends.com
Using the Kinesis API directly
K
I
N
E
S
I
S
![Page 23: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/23.jpg)
twitter-trends.com
Using the Kinesis API directlyK
I
N
E
S
I
S
iterator = getShardIterator(shardId, LATEST);
while (true) {
[records, iterator] =
getNextRecords(iterator, maxRecsToReturn);
process(records);
}
process(records): {
for (record in records) {
updateLocalTop10(record);
}
if (timeToDoOutput()) {
writeLocalTop10ToDDB();
}
}
while (true) {
localTop10Lists =
scanDDBTable();
updateGlobalTop10List(
localTop10Lists);
sleep(10);
}
![Page 24: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/24.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Challenges with using the Kinesis API directly
Kinesis
application
Manual creation of workers and
assignment to shards
![Page 25: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/25.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Challenges with using the Kinesis API directly
Kinesis
application
How many workers
per EC2 instance?
![Page 26: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/26.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Challenges with using the Kinesis API directly
Kinesis
application
How many EC2 instances?
![Page 27: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/27.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Using the Kinesis Client Library
Kinesis
application
Shard mgmt
table
![Page 28: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/28.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Elasticity and Load Balancing
Shard mgmt
table
![Page 29: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/29.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Elasticity and Load Balancing
Auto
scaling
Group
Shard mgmt
table
![Page 30: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/30.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Elasticity and Load Balancing
Auto
scaling
Group
Shard mgmt
table
![Page 31: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/31.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Elasticity and Load Balancing
Shard mgmt
table
Auto
scaling
Group
![Page 32: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/32.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Elasticity and Load Balancing
Shard mgmt
table
Auto
scaling
Group
![Page 33: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/33.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Fault Tolerance Support
Shard mgmt
table
Availability Zone
1
Availability Zone
3
![Page 34: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/34.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Fault Tolerance Support
Shard mgmt
table
XAvailability Zone
1
Availability Zone
3
![Page 35: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/35.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Fault Tolerance Support
Shard mgmt
table
XAvailability Zone
1
Availability Zone
3
![Page 36: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/36.jpg)
K
I
N
E
S
I
S
twitter-trends.com
Fault Tolerance Support
Shard mgmt
tableAvailability Zone
3
![Page 37: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/37.jpg)
Worker Fail Over
Amazon.com Confidential 37
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 85
Shard-1 Worker2 94
Shard-2 Worker3 76
![Page 38: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/38.jpg)
Worker Fail Over
Amazon.com Confidential 38
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 85 86
Shard-1 Worker2 94
Shard-2 Worker3 76 77X
![Page 39: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/39.jpg)
Worker Fail Over
Amazon.com Confidential 39
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 85 86 87
Shard-1 Worker2 94
Shard-2 Worker3 76 77 78X
![Page 40: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/40.jpg)
Worker Fail Over
Amazon.com Confidential 40
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 85 86 87 88
Shard-1 Worker3 94 95
Shard-2 Worker3 76 77 78 79X
![Page 41: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/41.jpg)
Worker Load Balancing
Amazon.com Confidential 41
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
Worker4
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 88
Shard-1 Worker3 96
Shard-2 Worker3 78X
![Page 42: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/42.jpg)
Worker Load Balancing
Amazon.com Confidential 42
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
Worker4
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 88
Shard-1 Worker3 96
Shard-2 Worker4 79X
![Page 43: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/43.jpg)
Resharding
Amazon.com Confidential 43
Shard-0Worker1
Worker2
LeaseKey LeaseOwner LeaseCounter checkpoint
Shard-0 Worker1 90 SHARD_END
Shard-0Shard-1
Shard-2
![Page 44: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/44.jpg)
Resharding
Amazon.com Confidential 44
Shard-0
Shard-1
Shard-2
Worker1
Worker2
LeaseKey LeaseOwner LeaseCounter checkpoint
Shard-0 Worker1 90 SHARD_END
Shard-1 0 TRIM_HORIZON
Shard-2 0 TRIM_HORIZON
Shard-0Shard-1
Shard-2
![Page 45: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/45.jpg)
Resharding
Amazon.com Confidential 45
Shard-0
Shard-1
Shard-2
Worker1
Worker2
LeaseKey LeaseOwner LeaseCounter checkpoint
Shard-0 Worker1 90 SHARD_END
Shard-1 Worker1 2 TRIM_HORIZON
Shard-2 Worker2 3 TRIM_HORIZON
Shard-0Shard-1
Shard-2
![Page 46: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/46.jpg)
Resharding
Amazon.com Confidential 46
Shard-1
Shard-2
Worker1
Worker2
LeaseKey LeaseOwner LeaseCounter checkpoint
Shard-1 Worker1 2 TRIM_HORIZON
Shard-2 Worker2 3 TRIM_HORIZON
Shard-0Shard-1
Shard-2
![Page 47: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/47.jpg)
500MM tweets/day = ~ 5,800 tweets/sec
2k/tweet is ~12MB/sec (~1TB/day)
$0.015/hour per shard, $0.014/million PUTS
Kinesis cost is $0.47/hour
Redshift cost is $0.850/hour (for a 2TB node)
Total: $1.32/hour
Cost &
Scale
Putting this into production
![Page 48: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/48.jpg)
Design Challenge(s)• Dynamic Resharding & Scale Out
• Enforcing Quotas (think proxy fleet with 1Ks servers)
• Distributed Denial of Service Attack (unintentional)
• Dynamic Load Balancing on Storage Servers
• Heterogeneous Workloads (tip of stream vs 7 day)
• Optimizing Fleet Utilization (proxy, control, data planes)
• Avoid Scaling Cliffs
• …
![Page 49: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/49.jpg)
Kinesis Streams: Streaming Data the AWS Way
• Pay as you go, no up front costs
• Elastically scalable
• Choose the service, or combination of
services, for your specific use cases.
• Real-time latencies
Deploy • Easy to provision, deploy, and manage
![Page 50: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/50.jpg)
Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Kinesis
![Page 51: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/51.jpg)
Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Kinesis
![Page 52: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/52.jpg)
Real-Time Streaming Data with Kinesis Streams
5 billion events/wk from
connected devices | IoT
17 PB of game data per
season | Entertainment
100 billion ad
impressions/day, 30 ms
response time | Ad Tech
100 GB/day click streams
250+ sites | Enterprise
50 billion ad
impressions/day sub-50
ms responses | Ad Tech
17 million events/day
| Technology
1 billion transactions per
day | Bitcoin
1 TB+/day game data
analyzed in real-time
| Gaming
![Page 53: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/53.jpg)
Streams provide a foundational
abstraction on which to build higher
level services
![Page 54: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/54.jpg)
Amazon Kinesis Firehose
• Zero Admin: Capture and deliver streaming data into S3, Redshift, and other
destinations without writing an application or managing infrastructure
• Direct-to-data store integration: Batch, compress, and encrypt streaming data
for delivery into S3, and other destinations in as little as 60 secs, set up in minutes
• Seamless elasticity: Seamlessly scales to match data throughput
Capture and submit
streaming data to Firehose
Firehose loads streaming data
continuously into S3 and Redshift
Analyze streaming data using your favorite
BI tools
![Page 55: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/55.jpg)
AW
S En
dp
oin
t
[Batch, Compress, Encrypt]
Data Sources
S3No Partition Keys
No Provisioning
End to End Elastic
Amazon Kinesis Firehose Fully Managed Service for Delivering Data Streams into AWS Destinations
Redshift
![Page 56: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/56.jpg)
AWS Event-Driven Computing
• Compute in response to recently occurring events
• Newly arrived/changed data
– Example: generate thumbnail for an image uploaded to S3
• Newly occurring system state changes
– Example: EC2 instance created
– Example: DynamoDB table deleted
– Example: Auto-scaling group membership change
– Example: RDS-HA primary fail-over occurs
![Page 57: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/57.jpg)
Event Driven Computing in AWS Today
SQS
S3 event notifications
![Page 58: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/58.jpg)
Event Driven Computing in AWS Today
DynamoDB Update Streams
![Page 59: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/59.jpg)
Cloudtrails event log for API calls
Event Driven Computing in AWS Today
S3
Customer 1
Customer 2
Customer 3
![Page 60: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/60.jpg)
Event Driven Computing in AWS Tomorrow
Single Event logs for asynchronous
service events
Customer 1
Customer 2
Customer 3
![Page 61: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/61.jpg)
Event Driven Computing in AWS Tomorrow
Event logs for asynchronous service events
Event logs from other data storage services
Customer 1
Customer 2
Customer 3
![Page 62: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/62.jpg)
A Unified Event Log Approach
KinesisSQS
(Unordered Events) (Ordered Events)
![Page 63: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/63.jpg)
K
I
N
E
S
I
S
Ordered Event Log Using Kinesis Streams
and the Kinesis Client Library
Shard mgmt
table
User
State
AWS EDC
ASG
Use of the KCL
Mostly writing business logic
EDC Rules Language
Simple CloudWatch actions in
response to matching rules.
![Page 64: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/64.jpg)
Event Logs for Customers’ Services
Vision: customers’ services and applications leverage the AWS
event log infrastructure
Cust. 2593
Cust. 7302
Cust. 3826
Widget A
Widget B
Widget C
www.widget.com
Per-customer control plane events sent
to customer’s unified control plane log
Per-entity data plane event logs
![Page 65: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/65.jpg)
Streaming data is highly prevalent and relevant;
Stream data processing is on the rise;
A key part of business critical workflows today, a
powerful abstraction for building a new class of
applications & data intensive services tomorrow.
A rich area for distributed systems, programming
model, IoT, and new service(s) research.
Closing Thoughts
![Page 66: Roger Barga - Processing Big Data in Motion](https://reader033.vdocuments.us/reader033/viewer/2022052606/584a2cdc1a28ab0b678b6b72/html5/thumbnails/66.jpg)
Questions