scaling through simplicity—how a 300 million user chat app reduced data engineering efforts by...
TRANSCRIPT
![Page 1: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/1.jpg)
Scaling Through Simplicity: How a 300 million user chat app reduced data engineering efforts by 70%
Joel CummingKik Interactive
![Page 2: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/2.jpg)
![Page 3: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/3.jpg)
![Page 4: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/4.jpg)
![Page 5: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/5.jpg)
At Kik, we believe that everyone has the right to
be curious.
![Page 6: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/6.jpg)
Data should be available to everyone and should be super easy to use.
![Page 7: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/7.jpg)
We have dashboards to glance at, reports to
analyze, and a data lake for exploration.
![Page 8: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/8.jpg)
However, Kik is a startup and we have to move
very quickly.
![Page 9: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/9.jpg)
Moving quickly often comes at the expense of
scalable data engineering.
![Page 10: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/10.jpg)
How can we compete with Facebook and Google (and their data teams) with a tiny team and very little time to
master new tools?
![Page 11: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/11.jpg)
Data v1 @ Kik
![Page 12: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/12.jpg)
Data v1 @ Kik
Data Lake & Transformations
Exploration & Analysis
KPIs
![Page 13: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/13.jpg)
We decided to make 8 changes
![Page 14: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/14.jpg)
Old
1. Streamline Data Collection via Kinesis Firehose
New
script
![Page 15: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/15.jpg)
2. Standardize Transformations with Spark SQL
Old
New
![Page 16: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/16.jpg)
3. Build a Data Lake (Caspian) in s3
Old
New
![Page 17: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/17.jpg)
4. Move from EMR to Managed Spark
Old
New
![Page 18: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/18.jpg)
5. Collaborate via Notebooks
Old
New
![Page 19: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/19.jpg)
6. Get Serious About Committing Code
Old
New
![Page 20: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/20.jpg)
7. Move to Airflow for Orchestration Flexibility
Old
New
![Page 21: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/21.jpg)
8. Standardize Reporting on re:dash
Old
New
![Page 22: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/22.jpg)
Data v2 @ Kik
![Page 23: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/23.jpg)
Recall: Data v1 @ Kik
Data Lake & Transformations
Exploration & Analysis
KPIs
![Page 24: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/24.jpg)
Data v2 @ Kik: Scaling through Simplicity
Data Lake & Transformations Exploration & Analysis KPIs
SQL
![Page 25: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/25.jpg)
New data is available within an hour in a query optimized format. Transformations can be built and
scheduled in minutes. Reports can be developed just as quickly.
![Page 26: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/26.jpg)
We estimate we save about 70% of our prior effort
Data CollectionSpark SQLData Lake
Managed SparkNotebooks
Commiting CodeBetter Orchestration
Standardize Reporting
% Effort Savings (based on hours invested in related activities, v1 vs. v2)
0 5 10 15 20
![Page 27: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/27.jpg)
What’s Next?
![Page 28: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/28.jpg)
1. Spark as a DW? 2. Structured Streaming 3. Data Lake Cataloging
![Page 29: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming](https://reader031.vdocuments.us/reader031/viewer/2022030206/58abca611a28ab68068b58d1/html5/thumbnails/29.jpg)
Thank [email protected]