big data deep learning with apache apex (native hadoop)

18
1 Deep Learning with Apache Apex Priyanka Gugale (Shah) [email protected]

Upload: datatorrent

Post on 21-Apr-2017

142 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Big Data Deep Learning with Apache Apex (Native Hadoop)

1

Deep Learning withApache Apex

Priyanka Gugale (Shah)[email protected]

Page 2: Big Data Deep Learning with Apache Apex (Native Hadoop)

2

• Apache Apex

• Deep Learning

• Deeplearning4j library

• Using deeplearning4j with Apex

• Architecture

• Demo screenshots

• Challenges

Agenda

Page 3: Big Data Deep Learning with Apache Apex (Native Hadoop)

3

• Platform and runtime engine that enables development of scalable and fault-tolerant distributed applications

• Hadoop native

○ No separate service to manage stream processing

○ Streaming Engine built into Application Master and Containers

• Process streaming or batch big data

• High throughput and low latency

• Library of commonly needed business logic

• Write any custom business logic in your application

Apache Apex

Page 4: Big Data Deep Learning with Apache Apex (Native Hadoop)

4

• Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data.

• Deep learning eliminates the need for feature engineering.• Effectively works on unsupervised data.

Deep Learning

Page 5: Big Data Deep Learning with Apache Apex (Native Hadoop)

5

• Deep learning uses deep neural networks to model the high level abstractions in data.

• Deep neural networks are neural networks with more than one hidden layer.

Deep Learning

Page 6: Big Data Deep Learning with Apache Apex (Native Hadoop)

6

• Object Classification in Photographs• Image Caption Generation• Automatic Game playing• Handwriting Recognition• Adding sound to silent movies• Colorization of black and white images

Applications of Deep Learning

Page 7: Big Data Deep Learning with Apache Apex (Native Hadoop)

7

● An Open Source Deep Learning library (released under Apache 2.0 license)● DL4J is Distributed● Written for Java and Scala● Integrated with Hadoop● Skymind is its commercial support arm● The Neural Net platform Dl4j provides various neural networks like Long

Short-Term Memory units, Convolutional Neural Networks for image processing, Deep AutoEncoder, Restricted Boltzmann Machine, Recurrent Nets, Denoising Autoencoders etc.

Deeplearning4j

Page 8: Big Data Deep Learning with Apache Apex (Native Hadoop)

8

Deeplearning4j

Page 9: Big Data Deep Learning with Apache Apex (Native Hadoop)

9

● Training Deep Learning models on single processor is extremely slow.

● Dl4j works with multi CPU and multi GPU systems.

● This integration will enhance the implementation of deep learning models in

distributed and stream processing environments.

Using deeplearning4j with Apex

Page 10: Big Data Deep Learning with Apache Apex (Native Hadoop)

10

• We achieve distributed training of neural networks using Data Parallelism.• In data parallelism, different machines have a complete copy of the model,

each machine simply gets a different portion of data.

Architecture

Page 11: Big Data Deep Learning with Apache Apex (Native Hadoop)

11

● We use a method called Parameter Averaging to combine and synchronize models trained on different machines.

Architecture

Page 12: Big Data Deep Learning with Apache Apex (Native Hadoop)

12

Apex Application DAG

Page 14: Big Data Deep Learning with Apache Apex (Native Hadoop)

14

DEMO

Page 15: Big Data Deep Learning with Apache Apex (Native Hadoop)

15

Page 16: Big Data Deep Learning with Apache Apex (Native Hadoop)

16

• Had to change default packaging of Apex Application

• We used Maven Shade plugin for packaging the app.

• Certain components of Nd4j are incompatible with KryoSerializer.

• We are using Java Serializer for those components.

Challenges

Page 17: Big Data Deep Learning with Apache Apex (Native Hadoop)

17

• Apache Apex - http://apex.apache.org/

• Subscribe to forums○ Apex - http://apex.apache.org/community.html○ DataTorrent - https://groups.google.com/forum/#!forum/dt-users

• Download - https://datatorrent.com/download/

• Twitter○ @ApacheApex; Follow - https://twitter.com/apacheapex○ @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://meetup.com/topics/apache-apex

• Webinars - https://datatorrent.com/webinars/

• Videos - https://youtube.com/user/DataTorrent

• Slides - http://slideshare.net/DataTorrent/presentations

• Startup Accelerator – Free full featured enterprise product○ https://datatorrent.com/product/startup-accelerator/

• Big Data Application Templates Hub – https://datatorrent.com/apphub

Resources

Page 18: Big Data Deep Learning with Apache Apex (Native Hadoop)

18

Thank You!