understanding images - the convnet way · convolutional neural networks - recap 1. learn...

Understanding Images - The ConvNet Way

meetup.com/IASI-AI/ facebook.com/AI.in.Iasi/

By: Ciprian Talmacel

Agenda:

1. Convolutional Neural Network Basics (~20 min)

2. Applications (~50 min)

3. Practical aspects(~20 min)

4. Code


Convolutional Neural Network Basics


Neural networks

1. Input: pixels, prices, sensor data

2. Output: object class, stock price, weather prediction


Neural Networks - Neurons

1. Sums the input2. Activates (or not)

The weights must be learned!


Neural Networks - Learning

1. Compute the output error for each input

2. Evaluate the contribution of each weight to that error

3. Make tiny adjustment to the weights to lower the error


Neural Networks

1. The input order does not matter2. Images have structure that should be exploited3. Convolutional neural networks take spatial structure into account


Convolutional Neural Networks

1. Looks at regions of pixels and tries to understand what they represent2. The deeper the layer, the more complex the concepts it identifies


Convolutional Neural Networks - The convolution operation

Move the filter through the image and try to detect interesting stuff regardless of the position.


Convolutional Neural Networks - Visualisation

1. The same objects should activate the roughly the same neurons2. Take some random set of neurons and see what activates them most


Convolutional Neural Networks - Visualisation - layer 1


Convolutional Neural Networks - Visualisation - layer 4


Convolutional Neural Networks - Recap

1. Learn hierarchical spatial structures2. Reuses the same filters over the whole image3. Each filter learns to detect something4. The output of each filter is used by the subsequent filters to detect more

complex stuff5. Each neuron corresponds to some image region6. Very efficient parallel implementation


Applications


Object classification on ImageNet

Input: 1.2 million images spanning 1000 categories (including 120 dog breeds)Output: Classify 100k unseen images with their respective category


Object classification performance history


Object classification with VGG-16 (2014)

1. 16 layers2. 8.8% error3. homogenous → simple4. 138 million parameters5. 128.64 ms on GTX 1080


Object classification with GoogLeNet (2014)


Object classification with GoogLeNet (2014)

1. 22 layers2. ultra-mega-efficient Inception module3. very ‘hairy’4. 10.07% error (6.66% in ensemble)5. 6.7 million parameters6. 39.14ms on GTX 1080


Object classification with ResNet (2015)

1. 152 layers2. adds skip-connections3. became state of the art in many tasks4. ~6% error (~3% in ensemble)5. 217ms on GTX 1080


Detection - Classification - Segmentation

Input: ImageOutput: 1. Bounding boxes 2. Object labels3. Pixel-level semantic segmentation


Detection - Classification - Segmentation with R-CNN (2014)

1. Propose bounding boxes through selective search 2. Pass each bounding box through CNN to extract features3. Classify with SVM to see if it is an object4. Given the class, tighten the box to fit well



Downsides:1. Pass ~2000 regions through CNN2. 3 separate models3. No pixel-level segmentations

Solutions:1. Select regions after computing features through RolAlign2. Train it all at once3. Add additional branch that outputs binary mask


Detection - Classification - Segmentation with Mask R-CNN (2017)


Face Recognition with FaceNet (2015)

1. Obtain embeddings such that 2 similar faces are very close with respect to euclidean distance

2. Discern between people by putting a threshold on the distance between embeddings


Face Recognition with FaceNet (2015)

1. Select 3 faces: 2 of the same person and one that is different

2. Compute embeddings for each of them using CNN

3. Minimize the distance between embeddings of the same person

4. Maximize the distance between embeddings of different persons


DeepDream (2015)

Input: ImageObjective: Optimize the image to so that it activates a desired neuronResult: CNNs become... psychedelic


DeepDream (2015)


NeuralStyle (2015)

Input: Image + Work of ArtObjective: The original image in the style of the work of artResult: ... Van Gogh is still living (in the cloud)


NeuralStyle (2015)

+


NeuralStyle (2015)


Adversarial nets (2014)

1. “Coolest idea ever” - Yann LeCun (inventor of CNN)2. Take 2 nets: a generator and a discriminator3. Generator tries to produce real-looking images4. Discriminator tries to distinguish between real in generated images5. No need for labeled data!


Adversarial nets (2014)


Adversarial nets - BEGAN (2017) - state of the art in image synthesis


Adversarial nets - CycleGan (2017) - unsupervised image translation


Adversarial nets - CycleGan (2017)- unsupervised image translation


CNNs in real business

1. Take images and return some tags: Clarifai, Google API, Vize.ai2. Eye tracking: Smart Eye3. Gesture recognition for home appliances: PointGrab4. Bots for precision agriculture: Lettucebot5. Driving assistance (self-driving cars are not ready yet)6. 3D Reconstructions (images to 3D models) 7. Medical imaging for diagnosis 8. Facial recognition: Chui - the intelligent doorbell 9. Style Transfer: Prisma

10. Arrhythmia detection (iRhytm Technologies&Stanford)iRihythm Technologies11. Facebook: “in the first 2 seconds your photo goes through 3 CNNs”


Some other cool existing/possible applications in no particular order

1. Image clustering, organisation, search (Google Photos)2. Image captioning (generate text from image)3. WaveNet (generate realistic speech from text)4. Playing Atari games, Mario5. Space exploration6. Driving cars 7. AR/VR8. Curing cancer9. Mass surveillance


How to bring ConvNets into submission?


Architectures and training

1. Don’t reinvent the wheel! Use whatever you find in literature!2. Look for help on reddit, facebook groups, quora, github3. Take a very good look at regularization techniques4. Don’t forget to split your dataset in: training, validation and testing5. Stop early6. Go from simple to complex


Transfer learning

1. Train on a big dataset (ImageNet) to discover good features2. Collect small dataset for your particular problem3. Tune the initial net to fit your data4. You can download trained nets and adapt them to your use-case


Data

1. Augment your data: flip, change colors a bit, translate2. Consider pre-processing3. Take into consideration the distribution of the labels4. Use multiple objectives 5. Use public datasets6. Crawl the web for data7. Make sure you can use the data (or get a lawyer)


Making CNNs smaller (to fit the mobile)

1. Architectural choices (SqueezeNet has 0.5 mb, also see Inception)2. Prunning - eliminate unimportant parameters3. Distilation - train small net using big net4. Quantization - 32bit -> 8bit, Huffman encoding


Frameworks - Tensorflow

1. Python and C++2. Created by Google3. Lots of open source projects4. Great for both research and production5. Works everywhere (server, mobile, Raspberry Pi)6. High level wrappers ( Keras, tf.Slim, TFlearn, Sonnet)7. Tensorboard and Tensorflow Serving 8. Not the best performance, but improving9. Static inputs (bad for RNNs)


Frameworks - Torch and PyTorch

1. Lua and Python2. Supported by Facebook3. Lots of open source projects4. Great for research 5. Very flexible6. Dynamic inputs (good for RNNs)7. Great performance8. Not so great in production


Frameworks - Caffee2

1. Python2. Created by Facebook3. Great for production 4. Relatively new so not so many projects5. Developed especially for mobile6. Architected by the creator of Caffe


Frameworks - Others

1. CNTK (Microsoft)2. MXNet (Carnegie Mellon, Amazon)3. CoreML (iOS)4. Theano5. Caffee6. Chainner7. Deeplearning4j

….everybody launches a deep learning framework these days


Hardware

1. CPUs - awful, don’t even think about it2. GPUs - Nvidia 1080Ti rocks! 3. Cloud - http://minimaxir.com/2017/07/cpu-or-gpu/4. TPUs - 30-40x speedup on inference

Read data from SSD, preprocess on CPU, train on GPU.


Where to begin?


Resources - some cool public datasets

1. ImageNet - millions of images of thousands of categories2. COCO - segmentations, keypoints, categories, captions3. LSUN - millions of rooms, churches, towers etc4. Medical ImageNet - peta-scale medical images coming soon5. MegaFace - millions of faces with identity coming soon6. Chars74k - 74k characters for ocr (0-9, A-Z, a-z)


Resources - most wanted

1. CS231n - CNNs for Visual Recognition - best course ever!2. Deep Learning Book - the pure essence of deep learning3. https://arxiv.org/pdf/1603.07285.pdf - wonderful tutorial on basic arithmetic4. http://colah.github.io - awesome deep learning blog5. https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of

-deep-learning/ - how CNNs run on GPUs beautifully explained6. http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html7. www.google.com - most amazing tutorials, articles, projects etc


https://arxiv.org/pdf/1603.07285.pdf

https://arxiv.org/pdf/1603.07285.pdf

http://colah.github.io/

http://colah.github.io/

https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/



http://www.google.com

http://www.google.com

Resources - some useful collections

1. https://github.com/jtoy/awesome-tensorflow2. https://github.com/caesar0301/awesome-public-datasets3. https://github.com/rushter/data-science-blogs4. http://deeplearninggallery.com/5. https://github.com/guillaume-chevalier/awesome-deep-learning-resources6. https://github.com/ChristosChristofidis/awesome-deep-learning7. https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap8. https://github.com/off99555/machine-learning-curriculum9. https://github.com/jbhuang0604/awesome-computer-vision

10. https://github.com/ceobillionaire/WHAT-AI-CAN-DO-FOR-YOU/


https://github.com/jtoy/awesome-tensorflow

https://github.com/jtoy/awesome-tensorflow

https://github.com/guillaume-chevalier/awesome-deep-learning-resources#some-datasets

https://github.com/guillaume-chevalier/awesome-deep-learning-resources#some-datasets

https://github.com/ChristosChristofidis/awesome-deep-learning

https://github.com/ChristosChristofidis/awesome-deep-learning

https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap

https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap

https://github.com/off99555/machine-learning-curriculum

https://github.com/off99555/machine-learning-curriculum

https://github.com/jbhuang0604/awesome-computer-vision

https://github.com/jbhuang0604/awesome-computer-vision

Thank you!


understanding images - the convnet way · convolutional neural networks - recap 1. learn...

Documents