spark summit eu talk by sudeep das and aish faenton
TRANSCRIPT
![Page 1: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/1.jpg)
![Page 2: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/2.jpg)
@NetflixResearch@aishfenton @datamusing
The missing MatPlotLib for Scala/Spark
![Page 3: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/3.jpg)
![Page 4: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/4.jpg)
![Page 5: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/5.jpg)
Jan 6th, 2016
![Page 6: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/6.jpg)
![Page 7: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/7.jpg)
MACHINE LEARNINGSYSTEMS CAN GET QUITECOMPLICATED
Real life workflow
![Page 8: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/8.jpg)
![Page 9: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/9.jpg)
STATISTICAL VISUALIZATIONS CAN BE PAINFULConsider a researcher at Netflix who who has raw data in a spark dataframe with columns:
show_id, num_of_views, country, timestamp, video_age
The researcher wants to make the following plot:
Plot the five most popular titles,
according to total number of views,
in the last 5 hours,
as bar charts faceted by country,
where the bars are color coded by video_age.
Sorting
Aggregating after filtering by timestamp
Grouping data by a categorical value (Country)
Mapping a quantitative column to a color
One could perform all these operations on the DF first and then create one bar plot per country via a loop.
Painful indeed!
![Page 10: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/10.jpg)
DECLARATIVESTATISTICALVISUALIZATION GRAMMAR
IN SCALA
You tell is WHAT should be done with the data, and it knowsHOW to do it!
Operations such as filtering, aggregation, faceting are built into the visualization, rather than putting the burden on the user to massage the data into shape.
Complex visualizations can be built with a few high level abstractions:
DATA
TRANS-FORMS
SCALES
GUIDES
MARKS
cf : Altair Talk by Brian Granger in PyData 2016 https://youtu.be/v5mrwq7yJc4
![Page 11: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/11.jpg)
Anatomy of a plot
X/Y channel
Shape Channel
Size Channel
Color Channel
![Page 12: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/12.jpg)
Features…
![Page 13: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/13.jpg)
1. Supports most plot types
![Page 14: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/14.jpg)
2. Trellis plots
![Page 15: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/15.jpg)
3. Layers
Layer 1.
Layer 2.
Layer 3.
![Page 16: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/16.jpg)
4. Notebook and Consoles
![Page 17: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/17.jpg)
5. Built-in spark support
Vegas.withDataFrame(myDataFrame).encodeX(“population”).encodeY(“age”)
Mapped Columns
Pass In DF.
![Page 18: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/18.jpg)
6. Visual statistics
● Advanced Binning
● Sorting
● Scaling
● Custom Transforms
● Time Series
● Aggregation
● Filtering
● Math functions (log, etc)
● Missing data support
● Descriptive Statistics
![Page 19: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/19.jpg)
How It Works !
![Page 20: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/20.jpg)
VEGA
D3JS
VEGA - LITE
VEGAS SCALA DSL EMITS TYPE-CHECKED
VEGA-LITE JSON
VEGA-LITE CONVERTS INTERNALLY TO VEGA JSON SPEC
VEGA TRANSLATES JSON TO D3JS CODE THAT CAN BE VERY VERBOSE
A SCALA DSL FOR VEGA-LITE
![Page 21: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/21.jpg)
![Page 22: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/22.jpg)
![Page 23: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/23.jpg)
![Page 24: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/24.jpg)
![Page 25: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/25.jpg)
![Page 26: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/26.jpg)
![Page 27: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/27.jpg)
![Page 28: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/28.jpg)
![Page 29: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/29.jpg)
![Page 30: Spark Summit EU talk by Sudeep Das and Aish Faenton](https://reader030.vdocuments.us/reader030/viewer/2022020411/586f75da1a28ab10258b61f3/html5/thumbnails/30.jpg)