deep learning jeff-shomaker_1-20-17_final_

Presentation at Global AI Conference

Santa Clara, CA

1-20-17

Jeff Shomaker

21 SP, Inc.

Deep LearningComparing Open-Source

Frameworks

21 SP, Inc.Proprietary and Confidential 2

Introduction

• Neural network software has been open-sourced so it can be used widely.

• I’ll discuss the following:– Neural networks – what are they?– Uses of Neural Networks– TensorFlow– Torch– CNTK– Caffe– Theano– Comparative metrics– Further reading

21 SP, Inc.Proprietary and Confidential

Neural Networks

3

Neural networks are a paradigm for processing information loosely based on the idea of neurons that communicate information in the brain and spinal cord. 2)

Source: 1) Raschka, S. (2016). What is the difference between deep learning and ‘Regular’ machine learning.www.kdnuggets.com, Diagram accessed 7-1-16. 2) Geoffrey Hinton, et al (2012). Neural networks for machinelearning course. U of Toronto, Coursera.com, Oct 2012. Accessed 2013.


Deep Learning Architectures E1)

• Deep Neural Networks– General approach for classification and regression– Widely used and successful in many areas

• Deep Belief Networks (DBNs)– A composition of Restricted Boltzmann Machines (RBMs)*– Used for unsupervised and supervised problems

• Recurrent Neural Networks (RNNs)**– Good for analyzing streams of data

– Successful in natural language processing

• Convolutional Neural Networks (CNNs)– Good for 2D data (usually labeled), like images– Inputs are transformed to 3D outputs

• E1) Ravi, D., et al (2016). See End Notes for full citation and annotation.

• * RBMs a type of stochastic NN. Good for modeling probabilities between variables.

• **Long Short-term Memory (LSTM) a variation of RNNs.

4


Examples of Neural Network (NN) Use

• Medicine

– Per IOM (Institute of Medicine, 2015) one of ten patient deaths in the US is due to misdiagnosis.– NNs can be used in diagnosis of multiple sclerosis, colon cancer, pancreatic disease,

gynecological diseases, diabetes, coronary artery disease, breast/thyroid cancer and others. 1)

• Finance– In 2014, card not present fraud was $2.9B in US – expected to be $6.4B by 2018.– NNs can be used for credit card fraud detection along with other machine learning approaches

such as Support Vector Machines, K-nearest neighbor, etc. 2)

• Network Security– The direct annual loss in 2011 from global cyber crime was $114B.

– Authors propose a Artificial Immune System that uses neural networks as detectors. 3)

• Energy Efficiency– During the next 10 years, electricity demand expected to grow by 13% to 15% per year.

– Authors describe a system using neural networks that can communicate with electricity grids.– Expected to reduce energy loss from 16% to between 3% -- 5%. 4)

• 1) Amato, F., et al (2013). Artificial neural networks in medical diagnosis. J Applied Biomedicine. 11:47-58.

• 2) Deshpande, PM, et al., (2016 Jan). Applications of data mining techniques for fraud detection in credit-debit card transactions. ISJRD, Conference on Technological Advancement and Automatization in Engineering. 339-345.

• 3) Komar, M., et al (2016). Intelligent cyber defense system. ICTERI, Kyiv, Ukraine, June 21-24 meeting, 534-549.

• 4) Buyuk, OO, et al (2016). A novel application to increase energy efficiency using artificial neural networks. IEEE. 1-5.

5


TensorFlow

• What is it:– Neural networks software for numerical computation - uses data flow

graphs for computation– Developed at Google’s machine intelligence research organization

• What can it be used for:– Any machine neural network problem

• Video Demonstration– Six minute video introduction on TensorFlow on YouTube.

• Further information:– www.tensorflow.org– https://www.youtube.com/watch?v=bYeBL92v99Y

6


Example using TensorFlow E2)

• Used a Convolutional Neural Network (CNN) to build a natural language understanding (NLU) system

• CNN designed to capture fluent customer responses in Sweden and route phone calls

• Model’s results were compared to those using Support Vector Machines (SVM), Naïve Bayes (NB) classifiers and Tellia Company’s own models.

• The CNN outperformed all on two of four data sets

• Expected that CNN would beat all with larger data sets

E2) Kjellgren, F. (2016). See End Notes for full citation and annotations.

7


Torch

• What is it:– Torch is a scientific computing framework for machine learning.– The goal is flexibility and the building of scientific algorithms quickly -

contains neural network and optimization libraries

• What can it be used for:– Machine learning neural network problems

• Video Demonstration– Three minute introduction on YouTube.

• Further information:– http://torch.ch/– https://www.youtube.com/watch?

v=uxja6iwOnc4&list=PLjJh1vlSEYgvGod9wWiydumYl8hOXixNu&index=19

8


Example using Torch E3)

• Created two deep neural network (NN) models with Torch

• 1st model was a Volumetric (3D) convolutional based deep NN and the 2nd model was a LSTM (Long Short Term Memory) based deep NN.

• Models designed to use audio and video as input and produce five personality traits as output: Conscientiousness, Neuroticism, Agreeableness, Extraversion, and Openness.

• The models used the ChaLearn LAP 2016 APA dataset that includes 10,000 videos.

• The second model was entered in the ChaLearn LAP APA2016 Challenge and won second place with eight teams ranked*

E3) Subramaniam, A., et al (2016). See End Notes for full citation and annotations.

* ChaLearn Looking at People ECCV Workshop 2016, 14 th European Conference on Computer Vision – Amsterdam, The Netherlands. www.eccv2016.org, Accessed 1-18-17.

9


CNTK

• What is it:– CNTK stands for Computational Network Toolkit - created by Microsoft.– Designed for use with CPUs or GPUs (i.e., graphical processing units)

• What can it be used for:– Used for image classification problems, video analysis, speech recognition

and natural language processing.

• Video Demonstration– A two minute introduction on YouTube.

• Further information:– https://www.cntk.ai/– https://www.youtube.com/watch?v=-mLdConF1EU

10


Example using CNTK E4)

• A multi-task deep learning feed-forward neural network (NN) was built called MtNet that solves the classification problem of whether a file is malware or not and places a malware file into a family.

• State-of-the-art results have been achieved with deep learning in speech and visual object recognition; but, not in malware systems.

• MtNet was trained (labeled data) and tested on 6.5 million files and limited the binary malware error rate to 0.358% and the family error rate to 2.94%, which is a big improvement over previous work.

• Models were trained on a single NVIDIA Telsa K40 GPU.*

• Results showed for the first time that adding hidden layers to a NN can improve the malware classification task.

E4) Huang, W., et al (2016). See End Notes for full citation and annotations.

*A Tesla GPU accelerator for servers. Per Nvidia Corp, Telsas designed to produce maximum throughput for large data flows. www.nvidia.com. Accessed 1-16-17.

11


Caffe

• What is it:– A deep learning framework designed to be modular and fast – used with

CPUs or GPUs.– Developed by Berkeley Vision and Learning Center (BLVC) and community

contributors.

• What can it be used for:– Originally for machine vision; but, now able to handle speech and text

problems.

• Video Demonstration– A three minute introduction on YouTube.

• Further information:– http://caffe.berkeleyvision.org/– https://www.youtube.com/watch?v=bOIZ74rOik0

12


Example using Caffe E5)

• Author’s created a highway data set of over 616,000 images and trained it on a convolutional neural network (CNN) to detect lanes and other cars.

• CNN models have been the best at image recognition during the last several years.

• The Caffe framework was used to develop deep learning models that were then used in self-driving cars on highways in the San Francisco Bay area.

• A 2014 Infiniti Q50 was used as the research vehicle.

• Results showed that CNNs can perform well on highways

E5) Huval, B., et al (2015). See End Notes for full citation and annotations.

13


• What is it:– Theano is a library and uses the Python language to build mathematical

expressions -- especially useful with multi-dimensional arrays.– Developed by the machine learning group at the University of Montreal.

• What can it be used for:– When complicated math is used repeatedly and speed is important

• Video Demonstration– A three minute introduction on YouTube.

• Further information:– http://deeplearning.net/software/theano/– https://www.youtube.com/watch?

v=fWkArbYtQbM&index=17&list=PLjJh1vlSEYgvGod9wWiydumYl8hOXixNu

14

Theano


Example using Theano E6)

• Used a Convolutional Neural Network (CNN) to create a deep learning system that classifies 2D CT scans of nodules in lung tissue into six categories for further analysis

• The 2D images are transformed into 3D images along with the probability that each nodule belongs to one of the following types: solid, calcified, part-solid, non-solid, perifissural, or speculated.

• Different types have a different likelihood of being cancerous.

• System trained on data from 943 patients and validated on 468 separate patients.

• System output was consistent with human performance and is well suited for high volume lung cancer screening.

E6) Clompi, F., et al (2016). See End Notes for full citation and annotations.

15


Deep Learning Framework Features

Table 1: Deep Learning Frameworks Features E8) Platform TensorFlow CNTK Caffe Theano Torch Release Date 2016 2016 2014 2010 2011 Core Language C++ C++ C++ C++ C APIs C++ NDL 2) Python Python Lua 3) Python Mathlab Deep Learning DBN 1), CNN, DBN, CNN, DBN, CNN, DBN, CNN, DBN, CNN, Models RNN RNN RNN RNN RNN Visualization Graph Graph Summary Graph Plots (Interactive) (Static) Statistics (Static) Training Monitoring E8) Fox, J., et al (2016). Software Frameworks for Deep Learning at Scale. See End Notes for full citation. 1) DBN - Deep Belief Networks, CNN - Convolutional Neural Networks, RNN - Recurrent Neural Networks. 2) NDL - high level domain specific language for implementing networks. 3) Lua - user interface scripting language.

16


Comparative Study of Five Frameworks E9)

• Caffe, TensorFlow, Theano, Torch, and Neon* evaluated on: 1) extensibility, 2) hardware utilization, and 3) speed.

• All comparisons based on running on a single machine using either 1) a multi-threaded CPU, or 2) a GPU (Nvidia Titan X).

• Results– Theano and Torch the most extensible (WRT handling deep architectures and

including supported libraries).– Torch has the best performance on deep network architectures, with Theano second– Torch the best for GPU-based convolutional and fully connected networks, with

Theano second.– Theano the best for recurrent networks (LSTMs).– TensorFlow is very flexible, but it suffers from poor performance compared to the

other frameworks on a single GPU.

E9) Bahrampour, S., et al (2016). See End Notes for full citation and annotations.*Neon a fairly new Python-based framework from Nervana. It is fast; but, does not have all the capabilities of the

other more mature frameworks.

17


Running Time Benchmarks E7)

• FCNs*, CNNs and RNNs benchmarked on CPUs and GPUs**

• Caffe, CNTK, TensorFlow, Torch and Theano (not studied here) can run on multi-core CPUs and many-core GPUs.

• Results– CPU-only platform --- no overall fastest

• FCNs – Torch the best• CNNs (AlexNet) – Caffe and Tensorflow the best• RNNs – CNTK very fast

– GPUs• FCNs – Caffe and CNTK the best

• CNNs (AlexNet on GTX 980 & K80 cards) – Caffe the fastest• RNNs – CNTK the fastest

• E7) Shi, S., et al (2016). See End Notes for full citation and annotations.• *FCNs – Fully Connected Neural Networks.• **GPUs significantly reduce training time -- a 10-30X increase in speed over CPUs.

18


End Notes

• E1) Ravi, D., et al (2016 Dec 28). Deep Learning for Health Informatics. IEEE J of Biomedical and Health Informatics, v PP, Is 99: 1-18. Accessed 1-10-17. [A good discussion of the different types of Deep Learning Architectures, nine software packages, and methods and applications in Bioinformatics, Medical Imaging, Pervasive Sensing, Medical Informatics, and Public Health].

• E2) Kjellgren, F., & Nordstrom, J. (2016). Convolutional Neural Networks for Semantic Classification of Fluent Speech Phone Calls. www.cs.umu.se, SLTC_2016_paper_48-1.PDF, Accessed 1-7-17 [Used TensorFlow Framework].

• E3) Subraman, A., et al (2016 Oct 31). Bi-modal First Impressions Recognition using Temporally Ordered Deep Audio and Stochastic Visual Features. 1-13. arXie preprint arXiv:1610.10048v1 [cs.CV]. Accessed 1-17-17. [Used Torch Framework].

• E4) Huang, W. & Stokes, J.W. (2016). MtNet: A Multitask Neural Network for Dynamic Malware Classification. In Detection of Intrusions and Malware, and Vulnerability Assessment, v 9721: 399-418. Psu.edu, 4750.pdf. Accessed 1-7-17. [Used CNTK Framework].

• E5) Huval, B., et al (2015 Apr 17). An Experimental Evaluation of Deep Learning on Highway Driving. 1-7. arXiv preprint. arXiv:1504.01716v3 [cs.RO]. Accessed 1-7-17. [Used Caffe Framework].

• E6) Ciompi, F., et al (2016 Oct 16). Toward Automatic Pulmonary Nodule Management in Lung Cancer Screening with Deep Learning. 1-10. arXiv preprint arXiv:1610.09157v1 [cs.CV]. Accessed 1-7-17. [Used Theano Framework].

•

19


End Notes (cont.)

• E7) Shi, S., et al (2016 Sep 19). Benchmarking State-of-the-Art Deep Learning Software Tools. 1-7. arXiv preprint, arXiv:1608.07249v5 [cs.DC]. arxiv.org. Accessed 1-7-17.

• E8) Fox, et al (2016). Software Frameworks for Deep Learning at Scale. 1-5. dsc.soic.indiana.edu. Accessed 1-8-17.

• E9) Bahrampour, S., et al (2016 Mar 30). Comparative Study of Deep Learning Networks. 1-9. arXie preprint, arXiv:1511.06435v3 [cs.LG]. Accessed 1-15-17. [Authors evaluate five frameworks: Caffe, TensorFlow, Theano, Torch and Neon. A high quality paper].

•

20


Further References

• What is a neural network – Episode 2 in Deep Learning Simplified, DeepLearning.TV, www.youtube.com.

• Zhang, Zhongheng (2016). A gentle introduction to artificial neural networks. Ann Translational Med. 1-5.

• Soniya, et al (2016). A review on advances in deep learning. IEEE, 1-6.

• Andrew Ng. Machine Learning Course, Stanford University, Coursera.com. https://www.coursera.org/learn/machine-learning

• Yaser Abu-Mostafa. Learning from Data: Introductory Machine Learning Course. CalTech. April 2012. Available on YouTube. https://www.youtube.com/watch?v=mbyG85GZ0PI

• Geoffrey Hinton. Neural Networks for Machine Learning Course, University of Toronto, Coursera.com, October 2012. https://www.coursera.org/learn/neural-networks

21


Contacts

• Jeff Shomaker – Founder/President 21 SP, Inc.–[email protected]–www.21spinc.com–650-455-7261

• 21 SP, Inc. is a small privately held startup developing decision support software to use in genetic-based personalized medicine. The company's mission is to create tools that will reduce the use of traditional trial-and-error medicine by using pharmacogenetics and other evidence-based data, such as the results of high quality clinical trials, in the medical clinic.

22

deep learning jeff-shomaker_1-20-17_final_

Presentations & Public Speaking