introducing athena: 08/19 big data application meetup, talk #3
TRANSCRIPT
![Page 1: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/1.jpg)
Athena Streaming with Samza
Goddess of Wisdom
![Page 2: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/2.jpg)
The team
![Page 3: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/3.jpg)
Brief Background Stream Processing, Samza, Athena
Kafka Uber uses Kafka as logging/event system
Message streams
Near-real time computation
Stream Processing Framework: Apache Samza
Platform built on top of Samza: Athena
![Page 4: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/4.jpg)
Current use cases
![Page 5: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/5.jpg)
Current use cases Aggregation
![Page 6: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/6.jpg)
Pricing
Assess supply and demand in real time to calculate accurate surge multiples
Kafka Samza location updates
Elasticsearch
S3 Spark location updates
HTTP Service
Realtime
Batch
Queries
![Page 7: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/7.jpg)
Mapper Job
Mapper Job
Mapper Job
Event parsing, filtering, classification
Event Aggregation
Raw events
Riak
Inserter Job
Inserter Job
Tiles
1m Agg Tiles
Reducer
Job
Reducer
Job
1m Agg Tiles
Per Contract Common
Artemis Samza
![Page 8: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/8.jpg)
Current use cases Event driven update engine
![Page 9: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/9.jpg)
Driver Activation
KAFKA SAMZA Driver status updates
Onboarding Service
Fetch driver info
RocksDB
Retry queue
Update status
![Page 10: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/10.jpg)
Upcoming use cases
![Page 11: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/11.jpg)
Fraud monitoring and alerting
KAFKA Partitioner
Real Time
Hourly
Alerting Service
Monitoring Service
Uberx metrics
![Page 12: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/12.jpg)
Da Vinci: Streaming platform for data science
KAFKA Aggregation job (Model generation)
RocksDB
Historical state
Timeseries data
Model evaluation
Database / Elasticsearch
updates
Model Updates
![Page 13: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/13.jpg)
Samza Architecture Overview
![Page 14: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/14.jpg)
Samza Architecture
![Page 15: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/15.jpg)
Basic Structure of a Task
![Page 16: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/16.jpg)
Task
![Page 17: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/17.jpg)
Deployment in Uber
![Page 18: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/18.jpg)
![Page 19: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/19.jpg)
Athena Tooling
![Page 20: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/20.jpg)
Tooling
● Athena manager ● Job configuration ● Unit test framework ● Graphite integration ● Codahale library support ● Maven archetype ● Artifactory support
![Page 21: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/21.jpg)
Athena Manager
![Page 22: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/22.jpg)
![Page 23: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/23.jpg)
![Page 24: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/24.jpg)
Job configuration
reference.conf
Sandbox Staging Production Dev
athena-core-lib
application.conf
Sandbox Staging Production Dev
Samza job
![Page 25: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/25.jpg)
Job configuration projects { artemis { job_1 { mapper { envs.common { task.inputs = "kafka.topic_1" task.class = "com.uber.athena.SampleTaskClass" } envs.local = ${projects.artemis.job_1.mapper.envs.common} envs.local = { task.window.ms = 30000 } envs.sandbox = ${projects.artemis.job_1.mapper.envs.local} envs.sandbox = { yarn.package.path = "http://artifactory..../artifactory/libs-snapshot-local/com/uber/athena/.../hello-athena.tar.gz" } envs.staging = ${projects.artemis.job_1.mapper.envs.sandbox} envs.production = ${projects.artemis.job_1.mapper.envs.sandbox} }
![Page 26: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/26.jpg)
Job configuration
![Page 27: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/27.jpg)
Unit test framework
Stream Job
StreamTask InitableTask WindowableTask
TaskUnitTestHarness
Message Listener
Inject data Custom IncomingMessageEnvelope
Custom MessageCollector
![Page 28: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/28.jpg)
Unit test framework
String classTaskName = "com.uber.athena.test.TestProcessTask"; // Register the job to the test harness TaskUnitTestHarness<String, Integer> testProcessTask = new TaskUnitTestHarness<>(classTaskName, false, true); // Register a listener for validating the output of every process function testProcessTask.registerMessageListenerOnProcess(new KeyAsserterListener()); // Start the job testProcessTask.start(); // Inject data testProcessTask.inject(key,value); // Get the full output testWindowTask.getResult()
![Page 29: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/29.jpg)
Tooling
● Athena manager ● Job configuration ● Unit test framework ● Graphite integration ● Codahale library support ● Maven archetype ● Artifactory support
![Page 30: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/30.jpg)
Observations
![Page 31: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/31.jpg)
Observations
● YARN is not bad ! ● Offset lag and Buffered messages ● Kafka Appender for ELK ● Checkpoint topic partition incorrect count ● Config validation needs improvement ● Job restarts are complicated ● Built-in Metrics are insufficient
![Page 32: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/32.jpg)
● Seamless upgrades ● Custom built-in serde support ● Config validation enhancement ● Auto benchmarking a Samza job ● Unit test framework enhancement
Upcoming Samza improvements
![Page 33: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3](https://reader031.vdocuments.us/reader031/viewer/2022022413/58eeca631a28ab3e098b46ff/html5/thumbnails/33.jpg)
Q&A