operational analytics - on-demand.gputechconf.com · end-to-end accelerated gpu data science...
TRANSCRIPT
![Page 1: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/1.jpg)
100xOperational AnalyticsThe RAPIDS SQL Engine
![Page 2: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/2.jpg)
SQL in Python on GPUs
gdf = bc.sql('select count(*) from table').get()
@blazingsql
![Page 3: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/3.jpg)
conda install
@blazingsql
launch a notebook
run queries
![Page 4: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/4.jpg)
Faster
Cheaper
Easier@blazingsql
![Page 5: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/5.jpg)
End-to-End Accelerated GPU Data ScienceIntroducing the Open-Source RAPIDS Library Suite
cuDF cuIODataFrame
GPU Memory
Data Preparation VisualizationModel Training
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep LearningcuXfilter <> pyViz
Visualization
Dask
@blazingsql
![Page 6: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/6.jpg)
End-to-End Accelerated GPU Data ScienceIntroducing the Open-Source RAPIDS Library Suite
cuDF cuIODataFrame
GPU Memory
Data Preparation VisualizationModel Training
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep LearningcuXfilter <> pyViz
Visualization
Dask
BlazingSQLSQL Engine
@blazingsql
![Page 7: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/7.jpg)
Storage Plugins
Supported:File Readers (cuIO):
@blazingsql
Data Lake
• AWS S3• Google Cloud Storage• HDFS
• CSV• JSON• Apache Parquet• Apache ORC
• Azure BlobComing Soon:
GPU Memory
![Page 8: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/8.jpg)
CSV GDF
Pandas Parquet JSON
ETLFeature
Engineering
XGBoost>cuDFBlazingSQL >>
YOURDATA
MACHINELEARNING
from blazingsql import BlazingContext
import cudf
bc = BlazingContext()
bc.s3('bsql', bucket_name='bsql', access_key_id='<access_key>', secret_key='<secret_key')
bc.create_table('orders', s3://bsql/orders/')
gdf = bc.sql('select * from orders').get()
@blazingsql
![Page 9: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/9.jpg)
XGBoost>cuDFBlazingSQL >>
T4 GPU
0.00
4 NODES
25.00
50.00
75.00
100.00
84.40
Netflow Demo Timings
Graphistry>cuDFBlazingSQL >>
TIME(Seconds)
15.6GB(1 x T4)
15.6GB(4 Nodes)
0
1000
2000
3000
XGBoost Demo TimingsTIME
(Seconds) $0.90
$0.04
0.87
84.40
Cost to run the ETL workloads on Google Cloud Platform @blazingsql
![Page 10: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/10.jpg)
@blazingsqlGCP: 5 x n1-standard-4 (Tesla T4 GPU) w/ Local NVME
• TPC-H SF100 Query Times - NVME Storage
![Page 11: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/11.jpg)
@blazingsqlGCP: 5 x n1-standard-4 (Tesla T4 GPU)
• TPC-H SF100 Query Times - GCS Storage
![Page 12: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/12.jpg)
@blazingsqlGCP: 15 x n1-standard-4 (Tesla T4 GPU)
• TPC-H SF300 Query Times - GCS Storage
![Page 13: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/13.jpg)
@blazingsql
• TPC-H SF100 vs SF300 - GCS Storage
![Page 14: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/14.jpg)
@blazingsql
Demos
![Page 15: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/15.jpg)
Scale
Up /
Acce
lerate
Scale out with RAPIDS
Scale out / Parallelize
Accelerated on single GPU
NumPy -> CuPy/PyTorch/..Pandas -> cuDFScikit-Learn -> cuMLNumba -> Numba
RAPIDS and Others
Multi-GPUOn single Node (DGX)Or across a cluster
RAPIDSBlazingSQL + Dask + OpenUCX
Multi-core and Distributed PyData
NumPy -> Dask ArrayPandas -> Dask DataFrameScikit-Learn -> Dask-ML… -> Dask Futures
DaskNumPy, Pandas, Scikit-Learn, Numba and many more
Single CPU coreIn-memory data
PyData
BlazingSQL + Dask + OpenUCX
@blazingsql
![Page 16: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed4baa20b1c4b116053bc6e/html5/thumbnails/16.jpg)
GET STARTED NOWIt’s easy to get started with BlazingSQL + RAPIDS.ai
CONDAGET STARTED
DOCKER HUBTRY NOW
GITHUBINSTALL
BlazingSQL can be installed with conda (miniconda, or
the full Anaconda distribution) from the blazingsql channel.
To run BlazingSQL on your own infrastructure, you can use our
container on Docker Hub.
BlazingSQL, the GPU-accelerated SQL engine of
the RAPIDS ecosystem,is now 100% open-source
licensed under Apache 2.0!
https://github.com/BlazingDB/https://hub.docker.com/u/blazingdbhttps://anaconda.org/blazingsql
@blazingsql