Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
1
2
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
● Introduction
● Old Architecture
● New Architecture
● Decoupling
● Streaming
● Conclusion
3
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
● Legacy Java Process○ “Crunches” data○ Sends data downstream to our own datastores and to 3rd party
analytics○ Runs every hour
● Growth○ Process can run over an hour○ 12 GB -> 24GB heap in less than 1 year○ Cron is a horrible job management system○ A failure requires rerunning a job from the beginning
● 2.0○ Horizontably scalable○ Real Time ETL○ Reuesable
4
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
ETL @ Vungle
● ~1 Billion Events / Day
● Deduplication
● Calculating $$$
● Outputting data to various destinations
5
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Old Architecture
6
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
7
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
8
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
9
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
10
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
11
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
12
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
13
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
14
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
New Architecture
15
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
16
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
17
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
18
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
19
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
20
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
21
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
22
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Decoupling
23
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
24
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
25
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
26
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
27
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
28
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
29
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
30
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
31
Introduction Problem Decoupling Streaming Conclusion
Setup connection and spark streams
Map each line of log into Mongo Objects and insert into mongo
32
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Setup connection and spark streams
33
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Mapping to Mongo objects and insertions
34
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Questions
35
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Streaming
36
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
37
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
38
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
39
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Ingestion
40
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Event ID Request View Install ... Request Added
View Added
Install Added
Value
Ingestion Table Schema
41
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
... Date Time Deliveries Views Installs Processed Deliveries
Processed Views
Processed Installs
Fact Table Schema
42
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Ingestion
43
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
44
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
45
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
46
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
47
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
48
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
49
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Process
50
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
51
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
52
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
53
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
54
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
55
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Next Steps
● Switching from JSON to ProtoBuf
● Using YARN to run multiple jobs on one cluster
● Data Science
● Who knows?
56
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion