keras’s phylanx backend - louisiana state...

Keras’s PhylanxBackend

Bita Hasheminezhad

STE||AR GROUP

Outline

2

– Available Deep Learning Platforms

– What’s Special about Keras

– Keras Backends

– Inference Example 1; Multi-Class classification

– Inference Example 2; Sentiment Analysis

– Keras In Future

– Conclusion

Deep Learning Platforms

3

– Spark Apache

– DistBelief Google

– TensorFlow Google

– CNTK Microsoft

– Project Adam Microsoft

– MXNet Apache

– CoreML Apple

– Theano Universite de Montreal

– Caffe Berkeley AI Research

– Caffe2 Facebook

– PyTorch Facebook

– SINGA National university of Singapore

– Chainer Preferred Networks

Support Keras


4

– Spark Apache

– DistBelief -> TensorFlow Google

– CNTK Microsoft

– Project Adam Microsoft

– MXNet Apache

– CoreML Apple


– Caffe -> Caffe2 -> PyTorch Facebook

– SINGA National university of Singapore

– Chainer Preferred Networks

What is Keras?

5

– Keras is a high-level neural networks API, written in Python and capable of running on top of a deferred execution backend.

User friendly

Modular

Easily extensible

[1] https://app.dimensions.ai/discover/publication

Fig 1. Number of publications during the last decade having the name of the DL platform in their full text1

https://keras.io/why-use-keras/

https://app.dimensions.ai/discover/publication

Deferred Style

Imperative or Eager Style


6

– TensorFlow Google

– CNTK Microsoft


– MXNet Apache

– CoreML Apple

– Caffe -> Caffe2 -> PyTorch Facebook

[2] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Kudlur, M. (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16) (pp. 265-283).c

– Deferred Execution: it has two distinct phases: the first phase defines the program as a symbolic graph; and the second phase executes an optimized version of the program on the set of available devices.2

Keras different backends

7

Platform Data Parallelism Model Parallelism

TensorFlow Synchronous or asynchronous through parameter servers

Supported using greedy heuristics

CNTK Bounded asynchronous through a parameter server model

Theano Not on multiple nodes

MXNet Synchronous or asynchronous through parameter servers

Not on multiple nodes

Table 1. Investigating parallelism in the deep learning platforms supported by Keras

– “When gradient nodes are automatically added to the graph, the user has less control, and the heuristics may break down. ”2

The solution to the problem

8[3] Zhang, Z., Yin, L., Peng, Y., & Li, D. (2018, December). A Quick Survey on Large Scale Distributed Deep Learning Systems.In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) (pp. 1052-1056). IEEE.

Problem:

On a single node, training ResNet50 on the ImageNet data set on an NVIDIA M40 GPU takes 14 days!3

Solution:

A High-Performance Keras Backend which

Is deferred style; can optimize the expression graph

Is distributed and can run on multiple nodes

Uses asynchronous computations; avoids straggler problem

Let’s use HPX!

HPX as a backend for Keras

9

– Using hints from the user and the optimization step the expression graph will be passed to the HPX runtime which schedules work and infers the data layout on each compute locale arguments.4

[4] http://phylanx.stellar-group.org/

HPX (C++)Keras (Python)

Phylanx

(Python Frontend,C++ Backend)

http://phylanx.stellar-group.org/

How to implement a Keras Backend

10

backendepsilon set_epsilon

floatx cast_to_floatx

set_floatximage_data_format

set_image_data_format

normalize_data_format

reset_uidsget_uid

learning_phase set_learning_phase

is_sparse to_dense

variable eval

update update_add update_sub

is_keras_tensoris_tensorplaceholder is_placeholder

moving_average_update identity

random_uniform_variablerandom_normal_variable

get_value set_value

name_scope

print_tensorfunction

gradients stop_gradient

in_test_phasein_train_phase

constantzerosoneseye zeros_likeones_like

max min sum prod cumsum cumprod argmax argminvar std mean square sqrt abs exp log logsumexp

dtype cast dot batch_dot transpose gather in_top_k

any all equal not_equal greater greater_equal less less_equalround sign pow clip maximum minimum permute_dimensions

shape int_shape ndim count_paramsreshape

cossinconcatenate stack repeat_elements repeat tilearange flatten expand_dims squeeze one_hot reverse

slice switch bias_add dropout l2_normalize

truncated_normalrandom_normalrandom_uniformrandom_binomial

resize_images resize_volumes

conv1d conv2d conv3dseparable_conv1d separable_conv2d

local_conv1d local_conv2d

pool2d pool3d

temporal_padding spatial_2d_padding spatial_3d_paddingconv2d_transpose conv3d_transpose depthwise_conv2d

map_fnfoldl flodr

rnn ctc_decodectc_label_dense_to_sparse

ctc_batch_cost

relu elu tanhsoftmax softplus softsign

sigmoid hard_sigmoid

binary_crossentropycategorical_crossentropysparse_categorical_crossentropy

batch_get_value batch_set_value

normalize_batch_in_trainingbatch_flatten batch_normalization

Keras related

main

Basic math

Convolutional

Recurrent

Batch related

Late Binding

Activations and Losses

Up to 4D

Inference

Not yet

Training

Inference Example 1; Multi-Class Classification

11

from keras import backend as K

from keras.datasets import mnist

from keras.utils import to_categorical

import numpy as np

import pandas as pd

(_,y_train), (x_test, y_test) = mnist.load_data()

num_classes = len(np.unique(y_train))

# convert class vectors to binary class matrices

print("y_train shape:", y_train.shape)

y_train = to_categorical(y_train, num_classes)

print("y_train shape, after one_hot encoding:", y_train.shape)

df = pd.read_csv('class_pred.csv')

class_pred = df.values

print("class_predict shape:", class_pred.shape)

print("A sample of class_predict:", class_pred[0])

labels_pred = K.argmax(class_pred, axis=1)

print("Predicted labels:", K.get_value(labels_pred))

print("What we have on y_test", K.eval(y_test))

Using Phylanx backend.

y_train shape: (60000,)

y_train shape, after one_hot encoding: (60000, 10)

class_predict shape: (10000, 10)

A sample of class_predict: [3.27987540e-37 1.93442800e-25 5.78854500e-25 1.94946260e-21

3.15305600e-31 1.03375155e-32 0.00000000e+00 1.00000000e+00 4.98417950e-32 3.93246830e-21]

Predicted labels: [7 2 1 ... 4 5 6]

What we have on y_test [7 2 1 ... 4 5 6]

Correct labels: [1 1 1 ... 1 1 1]

Number of corrects predictions: 9837

Accuracy: 98.37%

Label 4 is misclassified as 2









corrects = K.equal(labels_pred, y_test)

corrects = K.cast(corrects, 'int64')

print("Correct labels:", K.get_value(corrects))

number_of_corrects = K.get_value(K.sum(corrects))

print("Number of corrects predictions: %d "%number_of_corrects)

corrects = K.expand_dims(corrects, axis=0)

num_images = K.int_shape(corrects)[1]

print("Accuracy: %.2f%%" % ((number_of_corrects*100)/num_images))

# Misclassified

incorrects = K.not_equal(corrects, 1)

incorrects = K.eval(incorrects)

labels_error = (lambda x: x[0] * x[1])([K.eval(labels_pred), incorrects])

labels_true = (lambda x: x[0] * x[1])([K.eval(y_test), incorrects])

labels_true_slice = K.slice(K.squeeze(K.variable(labels_true),0),[0],[500])

labels_error_slice = K.slice(K.flatten(K.variable(labels_error)),[0],[500])

for i,j in zip(K.get_value(labels_true_slice), K.get_value(labels_error_slice)):

if i != 0:

print("Label",i,"is misclassified as",j)

12

y_score = K.gather(labels_pred, desc_score_indices)

y_true = K.gather(y_true, desc_score_indices)

y_true = K.cast(y_true, "int64")

print("y_true", K.get_value(y_true))

print("y_score", K.get_value(y_score))

diff = np.diff(K.eval(y_score))

distinct_value_indices = where(K.not_equal(diff, 0))

distinct_value_indices = K.get_value(distinct_value_indices)[0]

print("distinct_value_indices", distinct_value_indices)

threshold_idxs = K.eval(K.concatenate

([K.variable(distinct_value_indices),K.variable(np.array([largest_index]))], 0))

# accumulate the true positives with decreasing threshold

tps = K.get_value(K.gather(K.cumsum(y_true), threshold_idxs))

print("True Positives:", tps)

fps = 1 + threshold_idxs - tps

print("False Positives:", fps)

thresholds = K.get_value(K.gather(y_score, threshold_idxs))

print("Decreasing Threshold:", thresholds)

plot_roc_curve(tps, fps, thresholds)

Using Phylanx backend.

classes: [0 1]

Predicted labels: [9.2614290e-03 9.9999920e-01 9.9997926e-01

... 6.2763690e-05 3.3009052e-03 6.0482204e-01]

Accuracy: 86.704

desc_score_indices [12420 1594 2351 ... 11280 13389 18853]

y_true [1 1 1 ... 0 0 0]

y_score [1. 1. 1. ... 0. 0. 0.]

distinct_value_indices [ 484 593 981 ... 23598 23794 23973]

True Positives: [ 483 591 975 ... 12493 12495 12500]

False Positives: [ 2 3 7 ... 11302 11479 12500]

Decreasing Threshold: [1.0000000e+00 9.9999994e-01

9.9999990e-01 ... 5.9604645e-08 2.9802322e-08

0.0000000e+00]

Inference Example 2; Sentiment Analysisfrom keras import backend as K

from keras.datasets import imdb

import numpy as np

import pandas as pd

@Phylanx

def unique_eager(x):

return np.unique(x)

unique = Phylanx.lazy(unique_eager)

@Phylanx

def argsort_eager(x):

return np.argsort(x)

argsort = Phylanx.lazy(argsort_eager)

@Phylanx

def where_eager(x):

return np.where(x)

where = Phylanx.lazy(where_eager)

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

classes = unique(K.variable(y_test))

print("classes:", K.get_value(classes))

largest_index = y_test.size - 1

df = pd.read_csv('labels_pred.csv')

labels_pred = K.squeeze(df.values, 1)

print("Predicted labels:", K.get_value(labels_pred))

plt.hist(K.get_value(labels_pred))

plt.show()

corrects = K.less(K.abs(labels_pred - K.variable(y_test)), .5)

print("Accuracy:", K.get_value(K.sum(K.cast(corrects,"int32")))*100/

K.int_shape(corrects)[0])

y_true = K.equal(y_test, 1.)

# sort scores and corresponding truth values

indices = argsort(labels_pred)

desc_score_indices = K.eval(K.reverse(indices, 0))

print("desc_score_indices", desc_score_indices)

P

N

P TP

FN

TPR=TP/P FPR=FP/N

N

FP

TN

Keras in future

13[5] https://github.com/keras-team/keras/releases

https://github.com/keras-team/keras/releases

TensorFlow Eager (2.0)

14[6] https://www.tensorflow.org/guide/effective_tf2Deferred Style

Imperative or Eager Style

– TensorFlow 1.0

– CNTK

– Theano

– MXNet

– CoreML

– PyTorch

TensorFlow 2.0

https://www.tensorflow.org/guide/effective_tf2

Performance of TF Eager

15

– “We expect most real-world models to fall somewhere between these two, and to be able to recover performance by staging as required.”7

– “TensorFlow Eager is an evolving technology and closing the gap between imperative and staged performance is being worked on.”7

[7] Agrawal, A., Modi, A. N., Passos, A., Lavoie, A., Agarwal, A., Shankar, A., ... & Cai, S. (2019). Tensorflow eager: A multi-stage, python-embedded dsl for machine learning. arXiv preprint arXiv:1903.01855.

Fig 6. Examples per second training ResNet-50 on a GPU

Fig 7. Examples per second training L2HMC on a CPU

Where we are now

16

– We had a good progress on Phylanx backend of Keras

Many of needed primitives are implemented in Phylanx8

BlazeTensor has an acceptable support for 3D and 4D arrays9

– We need higher dimensionalities as in DL platforms we usually add batch of data and channels to the data dimension.

– “As one part of the development of TensorFlow, our team has extended the open source Eigen library with support for arbitrary dimensionality tensor operations.”10

Majority of Keras backend tests are passed11

[8] https://github.com/STEllAR-GROUP/phylanx[9] https://github.com/STEllAR-GROUP/blaze_tensor[10] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Ghemawat, S. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.[11] https://github.com/STEllAR-GROUP/keras

https://github.com/STEllAR-GROUP/phylanx

https://github.com/STEllAR-GROUP/blaze_tensor

https://github.com/STEllAR-GROUP/keras

Thank you for your attention

keras’s phylanx backend - louisiana state...

Documents