strata beijing 2017: jumpy, a python interface for nd4j
TRANSCRIPT
![Page 1: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/1.jpg)
![Page 2: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/2.jpg)
Who are we?
This slide shows that GPUs should complement the big data stack on the Hadoop ecosystem, rather than trying to replace Hadoop etc. outright. Wholesale replacement of the big data stack will be cost-prohibitive to many clients. We believe the right approach is to sell GPUs for accelerated computation and a few other use cases. That’s our beach head. (Obviously, the widening functionality of the Volta will change the GPU ecosystem.)
Founded 2014
Distributed worldwide
Lots of activity in China
![Page 3: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/3.jpg)
Skymind in China
![Page 4: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/4.jpg)
Most JVM python interfaces
● Network based. Requires gateway and py4j● Tons of overhead. Often a bottleneck with real Spark
jobs● Places a focus on “pushing logic down to scala”● Doesn’t interop well with existing python ecosystem● Often api compatibility issues● “Good enough” for basic use cases despite overhead
![Page 5: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/5.jpg)
Basic facts about overhead
● In depth paper: https://arxiv.org/pdf/1612.01437.pdf● Python vs scala: 15x slower● Much of this is due to network traffic● Serialization is another big problem● Imagine saving objects every time you run compute.
![Page 6: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/6.jpg)
Distributed Deep Learning bottlenecks
● Network overhead from param servers● Data movement between cpu and gpu● Buffer allocation for compute● Data Loading and input creation (creating tensors
from data)
![Page 7: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/7.jpg)
Linear Algebra in python
● C based internally● Python is just an interface● Tend to interop with numpy pointers directly● Supports cpu and gpu● For DL often varied engines (MPI,GRPC,..)● Often extended in C
![Page 8: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/8.jpg)
Linear Algebra in spark
● Based on breeze and net lib java (not maintained anymore, limited to cpu)
● Most routines are Scala based● On heap memory (bad for latency)● Cuda support is sparse at best● Doesn’t conform with industry standards (python)● Not meant for heavy compute (hardware accel)● Relies on spark for most ops (you can’t do this with
deep learning)
![Page 9: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/9.jpg)
Minor conclusions
● 1 of these is not like the other ● Hard to interop with python ecosystem● Spark tries to be something it’s not re: linear algebra● Spark should do data loading. Not linear algebrabetter handled by c++ (simd,gpus,..)● Alternatives are needed (more specialization) (a focus
on c++ with pythonic conventions)
![Page 10: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/10.jpg)
Nd4j
● Java based api, c++ core● Own off heap memory management (even for gpu)● Soon: Autodiff and graph execution (graph of
operations) and sparse● Similar architecture to numpy (easy interop)
(http://nd4j.org/userguide)● Works with blas/lapack ● Generally faster than numpy even from python (as
we’ll see soon)● It’s not python though!
![Page 11: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/11.jpg)
Nd4j Parameter Server Aeron: More stable latency than GRPC and way faster (25x!) than TF
![Page 12: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/12.jpg)
Jumpy: A better python interface
● Low latency using c internally● Interface with nd4j <-> numpy via direct pointers● Syntax sugar similar to numpy● Uses jnius underneath(https://github.com/kivy/pyjnius)● JNIUS starts and manages a JVM for you. Interops
via JNI and Cython● Easy to extend
![Page 13: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/13.jpg)
Jumpy examples
![Page 14: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/14.jpg)
Thanks! Join our QQ group:
![Page 15: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/15.jpg)
Conclusions and future work
● No networks! An actual path to improvement● Reflection can be a bottleneck● Like most useful things in python, most of it is c!● Plans to optimize pyjnius itself ● Can enable us to interop with other parts of python
![Page 16: Strata Beijing 2017: Jumpy, a python interface for nd4j](https://reader034.vdocuments.us/reader034/viewer/2022052117/5a662ae17f8b9ad5138b4b33/html5/thumbnails/16.jpg)