fosdem2017 scientific computing on jruby

70
Scientific Computing on JRuby github.com/prasunanand

Upload: prasun-anand

Post on 07-Feb-2017

60 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Fosdem2017  Scientific computing on Jruby

Scientific Computing on JRubygithub.com/prasunanand

Page 2: Fosdem2017  Scientific computing on Jruby

Objective●A Scientific library is memory intensive and speed counts. How to

use JRuby effectively to create a great tool/gem?

●A General Purpose GPU library for Ruby that can be used by industry in production and academia for research.

Page 3: Fosdem2017  Scientific computing on Jruby

●Ruby Science Foundation

●SciRuby has been trying to push Ruby for scientific computing.

●Popular Rubygems:

1.NMatrix

2.Daru

3.Mixed_models

4.Nyaplot

5.Ipython Notebook

Page 4: Fosdem2017  Scientific computing on Jruby

NMatrix

●NMatrix is SciRuby’s numerical matrix core, implementing dense

matrices as well as two types of sparse (linked-list-based and

Yale/CSR).

●It currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK

for several of its linear algebra operations.

Page 5: Fosdem2017  Scientific computing on Jruby
Page 6: Fosdem2017  Scientific computing on Jruby

Daru

Page 7: Fosdem2017  Scientific computing on Jruby

Mixed_models

Page 8: Fosdem2017  Scientific computing on Jruby

Nyaplot

Page 9: Fosdem2017  Scientific computing on Jruby

SciRuby vs SciPy

●We love Ruby.

●We love Rails.

●Expressiveness of Ruby.

Page 10: Fosdem2017  Scientific computing on Jruby

●Known for performance JRuby is 10 times faster than CRuby.

●With truffle it’s around 40 times faster than CRuby. Truffle is

supported by Oracle.

Page 11: Fosdem2017  Scientific computing on Jruby

Say Hello!

Page 12: Fosdem2017  Scientific computing on Jruby

NMatrix for JRuby

●Parallelism=> No Global Interpreter Lock as in case of MRI

●Easy Deployment(Warbler gem)

●Auto Garbage collection.

●Speed

●NMatrix for JRuby relies on Apache Commons Math

Page 13: Fosdem2017  Scientific computing on Jruby

MDArray●Not a unified interface for Sciruby gems=> Why not build a

wrapper around MDArray ?

●MDArray is a great gem for Linear Algebra.

●MdArray used Parallel colt that was depreceated.

●However, every gem that used NMatrix as dependency needed to be reimplemented with MDArray.

●Hence, putting in effort for optimization.

Page 14: Fosdem2017  Scientific computing on Jruby

How NMatrix works?●N-Dimensional

●2-Dimensional NMatrix

Page 15: Fosdem2017  Scientific computing on Jruby

N-dimensional matrices are stored as a one-dimensional Array!

Page 16: Fosdem2017  Scientific computing on Jruby

NMatrix Architecture

MRI JRuby

Page 17: Fosdem2017  Scientific computing on Jruby

N - dimensional Matrix

Page 18: Fosdem2017  Scientific computing on Jruby

Elementwise Operation

●[:add, :subtract, :sin, :gamma]

●Iterate through the elements.

●Access the element; do the operation, return it

Page 19: Fosdem2017  Scientific computing on Jruby
Page 20: Fosdem2017  Scientific computing on Jruby

Challenges

●Autoboxing and Multiple data type

●Minimise copying of data

Page 21: Fosdem2017  Scientific computing on Jruby

Errors that can’t be reproduced :p

[ 0.11, 0.05, 0.34, 0.14 ]

+ [ 0. 21, 0.05, 0.14, 0.14 ]

= [ 0, 0, 0, 0]

([ 0. 11, 0.05, 0.34, 0.14 ] + 5)

+ ([ 0. 21, 0.05, 0.14, 0.14 ] + 5)

- 10

= [ 0.32, 0.1, 0.48, 0.28]

Page 22: Fosdem2017  Scientific computing on Jruby

Autoboxing

● :float64 => double only

● Strict dtypes => creating data type in Java. Can’t Rely on

Reflection

● @s = Array.new()

● @s = Java::double[rows*cols].new()

Page 23: Fosdem2017  Scientific computing on Jruby

Autoboxing and Enumerators def each_with_indices nmatrix = create_dummy_nmatrix stride = get_stride(self) offset = 0 coords = Array.new(dim){ 0 } shape_copy = Array.new(dim) (0...size).each do |k| dense_storage_coords(nmatrix, k, coords, stride, offset) slice_index = dense_storage_pos(coords,stride) ary = Array.new

if (@dtype == :object) ary << self.s[slice_index] else ary << self.s.toArray.to_a[slice_index] end (0...dim).each do |p| ary << coords[p] end yield(ary) end if block_given?

return nmatrix end

Page 24: Fosdem2017  Scientific computing on Jruby

Minimise copying of data

●Make sure you don’t make copies of data.

●Pass-by-Reference in action:

○ Use static methods as helpers.

Page 25: Fosdem2017  Scientific computing on Jruby

2 - dimensional Matrix

Page 26: Fosdem2017  Scientific computing on Jruby

2 - dimensional Matrix Operations

●[:dot, :det, :factorize_lu]

●In NMatrix-MRI, BLAS-III and LAPACK routines are implemented

using their respective libraries.

●NMatrix-JRuby depends on Java functions.

Page 27: Fosdem2017  Scientific computing on Jruby

Challenges

●Converting a 1-D array to 2-D array

●Array Size and Accessing elements

●Speed and Memory Required

Page 28: Fosdem2017  Scientific computing on Jruby
Page 29: Fosdem2017  Scientific computing on Jruby

Ruby Codeindex =0puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| c[i][j] = b[i][j] index+=1 end end}

#67.790000 0.070000 67.860000 ( 65.126546)#RAM consumed => 5.4GB

b = Java::double[15_000,15_000].newc = Java::double[15_000,15_000].newindex=0puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| b[i][j] = index index+=1 end end}#43.260000 3.250000 46.510000 ( 39.606356)

Page 30: Fosdem2017  Scientific computing on Jruby
Page 31: Fosdem2017  Scientific computing on Jruby

Java Codepublic class MatrixGenerator{public static void test2(){for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ c[i][j]= b[i][j]; index++; } }

}puts Benchmark.measure{MatrixGenerator.test2}

#0.034000 0.001000 00.034000 ( 00.03300)#RAM consumed => 300MB

public class MatrixGenerator{public static void test1(){

double[][] b = new double[15000][15000];double[][] c = new double[15000][15000];for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ b[i][j]= index; index++; } }

}puts Benchmark.measure{MatrixGenerator.test1}#0.032000 0.001000 00.032000 ( 00.03100)

Page 32: Fosdem2017  Scientific computing on Jruby

ResultsImproves:

●1000 times the speed

●10times the memory

Page 33: Fosdem2017  Scientific computing on Jruby

Mixed models●After NMAtrix for doubles was ready, I tested it with mixed_models.

Page 34: Fosdem2017  Scientific computing on Jruby

Benchmarking NMatrix functionalities

Page 35: Fosdem2017  Scientific computing on Jruby

System Specifications

●CPU: AMD FX8350 0ctacore 4.2GHz

●RAM: 16GB

Page 36: Fosdem2017  Scientific computing on Jruby

Addition

Page 37: Fosdem2017  Scientific computing on Jruby

Subtraction

Page 38: Fosdem2017  Scientific computing on Jruby

Gamma

Page 39: Fosdem2017  Scientific computing on Jruby

Matrix Multiplication

Page 40: Fosdem2017  Scientific computing on Jruby

Determinant

Page 41: Fosdem2017  Scientific computing on Jruby

Factorization

Page 42: Fosdem2017  Scientific computing on Jruby

Benchmark conclusion●NMatrix-JRuby is incredibly faster for N-dimensional matrices when

elementwise operations are concerned.

●NMatrix-MRI is faster for 2-dimensional matrix when calculating matrix multiplication, determinant calculation and factorization.

Page 43: Fosdem2017  Scientific computing on Jruby

Improvements

●Make NMatrix-JRuby faster than NMatrix-MRI using BLAS level-3 and

LAPACK routines.

●How?

●Why not JBlas?

Page 44: Fosdem2017  Scientific computing on Jruby

MRI JRuby

Page 45: Fosdem2017  Scientific computing on Jruby

Future Work

●Add support for complex dtype.

●Convert NMatrix-JRuby Enumerators to Java code.

●Add sparse support.

Page 46: Fosdem2017  Scientific computing on Jruby

Am I done?

Page 47: Fosdem2017  Scientific computing on Jruby

Nope!

Page 48: Fosdem2017  Scientific computing on Jruby

Enter GPU

Page 49: Fosdem2017  Scientific computing on Jruby

A General-Purpose GPU library●Combine the beauty of Ruby with transparent GPU processing

●This will work both on client computers and on servers that make use of TESLA's and Intel Xeon Phi solutions.

● Developer activity and support for the current projects is mixed at best, and they are tough to use as they involve writing kernels and require a lot of effort to be put in buffer/RAM optimisation.

Page 50: Fosdem2017  Scientific computing on Jruby

ArrayFire-rb●Wraps ArrayFire library

Page 51: Fosdem2017  Scientific computing on Jruby

ArrayFire

●ArrayFire is an open-source GPGPU library written in C++ and uses

JIT.

●ArrayFire supports CUDA-capable NVIDIA GPUs, OpenCL devices,

and a C-programming backend.

●It abstracts away from the difficult task of writing kernels for

multiple architectures; handling memory management, and

performing tuning and optimisation.

Page 52: Fosdem2017  Scientific computing on Jruby

Using ArrayFire

Page 53: Fosdem2017  Scientific computing on Jruby

MRI●C extension

●Architecture is inspired by NMatrix and NArray

●The C++ function is placed in a namespace (e.g., namespace af { }) or is declared static if possible. The C function receives the prefix af_, e.g., arf_multiply() (this function also happens to be static).

●C macros are capitalized and generally have the prefix ARF_, as with ARF_DTYPE().

●C functions (and macros, for consistency) are placed within extern "C" { } blocks to turn off C++ mangling.

●C macros (in extern blocks) may represent C++ constants (which are always defined in namespace arf {} or a child thereof).

Page 54: Fosdem2017  Scientific computing on Jruby

#include <ruby.h>typedef struct AF_STRUCT{ size_t ndims; size_t count; size_t* dimension; double* array;}afstruct;

void Init_arrayfire() { ArrayFire = rb_define_module("ArrayFire"); Blas = rb_define_class_under(ArrayFire, "BLAS", rb_cObject); rb_define_singleton_method(Blas, "matmul", (METHOD)arf_matmul, 2);}

static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE right_val){ afstruct* left; afstruct* right; afstruct* result = ALLOC(afstruct); Data_Get_Struct(left_val, afstruct, left); Data_Get_Struct(right_val, afstruct, right); result->ndims = left->ndims; size_t dimension[2]; dimension[0] = left->dimension[0]; dimension[1] = right->dimension[1]; size_t count = dimension[0]*dimension[1]; result->dimension = dimension; result->count = count; arf::matmul(result, left, right); return Data_Wrap_Struct(CLASS_OF(left_val), NULL, arf_free, result);}

Page 55: Fosdem2017  Scientific computing on Jruby

#include <arrayfire.h>namespace arf { using namespace af; static void matmul(afstruct *result, afstruct *left, afstruct *right) { array l = array(left->dimension[0], left->dimension[1], left->array); array r = array(right->dimension[0], right->dimension[1], right->array); array res = matmul(l,r); result->array = res.host<double>(); }}extern "C" { #include "arrayfire.c"}

Page 56: Fosdem2017  Scientific computing on Jruby

JRuby

●The approach is same as NMatrix JRuby.

●Java Native Interface( JNI )

●Work on ArrayFire-Java.

Page 57: Fosdem2017  Scientific computing on Jruby

● Place 'libaf.so' in the Load path.

require 'ext/vendor/ArrayFire.jar'class Af_Array attr_accessor :dims, :elements def matmul(other) Blas.matmul(self.arr, other) endend

Page 58: Fosdem2017  Scientific computing on Jruby

Benchmarking ArrayFire

Page 59: Fosdem2017  Scientific computing on Jruby

System SpecificationCPU: AMD FX Octacore 4.2GHz

RAM: 16GB

GPU: Nvidia GTX 750Ti

GPU RAM : 4GB DDR5

Page 60: Fosdem2017  Scientific computing on Jruby

Matrix Addition

Page 61: Fosdem2017  Scientific computing on Jruby

Matrix Multiplication

Page 62: Fosdem2017  Scientific computing on Jruby

Matrix Determinant

Page 63: Fosdem2017  Scientific computing on Jruby

Factorization

Page 64: Fosdem2017  Scientific computing on Jruby

Transparency

●Integrate with Narray

●Integrate with NMatrix

●Integrate with Rails

Page 65: Fosdem2017  Scientific computing on Jruby

Applications●Endless possibilities ;)

●Bioinformatics

●Integrate Tensorflow

●Image Processing

●Computational Fluid Dynamics

Page 66: Fosdem2017  Scientific computing on Jruby

Conclusion

Page 67: Fosdem2017  Scientific computing on Jruby

Useful Links●https://github.com/sciruby/nmatrix

●https://github.com/arrayfire/arrayfire-rb

●https://github.com/prasunanand/arrayfire-rb/tree/temp

Page 68: Fosdem2017  Scientific computing on Jruby

Acknowlegements1.Pjotr Prins

2.Charles Nutter

3.John Woods

4.Alexej Gossmann

5.Sameer Deshmukh

6.Pradeep Garigipati

Page 69: Fosdem2017  Scientific computing on Jruby
Page 70: Fosdem2017  Scientific computing on Jruby

Thank You

Github: prasunanandTwitter: @prasun_anandBlog: prasunanand.com