deep learning cases: text and image processing
TRANSCRIPT
![Page 1: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/1.jpg)
Deep Learning Cases: Text and Image Processing
Grigory Sapunov
Founders & Developers: Deep Learning UnicornsMoscow 03.04.2016
![Page 2: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/2.jpg)
“Simple” Image & Video Processing
![Page 3: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/3.jpg)
Simple tasks: Classification and Detection
http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
Detection task is harder than classification, but both are almost done.And with better-than-human quality.
![Page 4: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/4.jpg)
Case #1: IJCNN 2011The German Traffic Sign Recognition Benchmark
● Classification, >40 classes● >50,000 real-life images● First Superhuman Visual Pattern Recognition
○ 2x better than humans○ 3x better than the closest artificial competitor○ 6x better than the best non-neural method
http://benchmark.ini.rub.de/index.php?section=gtsrb&subsection=results#
Method Correct (Error)1 Committee of CNNs 99.46 % (0.54%)2 Human Performance 98.84 % (1.16%)3 Multi-Scale CNNs 98.31 % (1.69%)4 Random Forests 96.14 % (3.86%)
http://people.idsia.ch/~juergen/superhumanpatternrecognition.html
![Page 5: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/5.jpg)
Case #2: ILSVRC 2010-2015Large Scale Visual Recognition Challenge (ILSVRC)
● Object detection (200 categories, ~0.5M images)● Classification + localization (1000 categories, 1.2M images)
![Page 6: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/6.jpg)
Case #2: ILSVRC 2010-2015
● Blue: Traditional CV● Purple: Deep Learning● Red: Human
![Page 7: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/7.jpg)
Examples: Object Detection
![Page 8: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/8.jpg)
Example: Face Detection + Emotion Classification
![Page 9: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/9.jpg)
Example: Face Detection + Classification + Regression
![Page 10: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/10.jpg)
Examples: Food Recognition
![Page 11: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/11.jpg)
Examples: Computer Vision on the Road
![Page 12: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/12.jpg)
Examples: Pedestrian Detection
![Page 13: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/13.jpg)
Examples: Activity Recognition
![Page 14: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/14.jpg)
Examples: Road Sign Recognition (on mobile!)
![Page 15: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/15.jpg)
● NVidia Jetson TK1/TX1○ 192/256 CUDA Cores○ 64-bit Quad-Core ARM A15/A57 CPU, 2/4 Gb Mem
● Raspberry Pi 3○ 1.2 GHz 64-bit quad-core ARM Cortex-A53, 1 Gb SDRAM, US$35
● Tablets, Smartphones● Google Project Tango
Deep Learning goes mobile!
![Page 16: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/16.jpg)
...even more mobile
http://www.digitaltrends.com/cool-tech/swiss-drone-ai-follows-trails/
This drone can automatically follow forest trails to track down lost hikers
![Page 17: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/17.jpg)
...even homemade automobile
Meet the 26-Year-Old Hacker Who Built a Self-Driving Car... in His Garagehttps://www.youtube.com/watch?v=KTrgRYa2wbI
![Page 18: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/18.jpg)
More complex Image & Video Processing
![Page 19: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/19.jpg)
https://www.youtube.com/watch?v=ZJMtDRbqH40 NYU Semantic Segmentation with a Convolutional Network (33 categories)
Semantic Segmentation
![Page 20: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/20.jpg)
Caption Generation
http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”
![Page 21: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/21.jpg)
![Page 22: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/22.jpg)
Example: NeuralTalk and Walk
Ingredients:
● https://github.com/karpathy/neuraltalk2 Project for learning Multimodal Recurrent Neural Networks that describe images with sentences
● Webcam/notebook
Result:
● https://vimeo.com/146492001
![Page 23: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/23.jpg)
More hacking: NeuralTalk and Walk
![Page 24: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/24.jpg)
Product of the near future: DenseCap and ?
http://arxiv.org/abs/1511.07571 DenseCap: Fully Convolutional Localization Networks for Dense Captioning
![Page 25: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/25.jpg)
Image Colorization
http://richzhang.github.io/colorization/
![Page 26: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/26.jpg)
Visual Question Answering
https://avisingh599.github.io/deeplearning/visual-qa/
![Page 27: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/27.jpg)
Reinforcement LearningУправление симулированным автомобилем на основе видеосигнала (2013)http://people.idsia.ch/~juergen/gecco2013torcs.pdf http://people.idsia.ch/~juergen/compressednetworksearch.html
![Page 28: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/28.jpg)
Reinforcement Learning
![Page 29: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/29.jpg)
Reinforcement LearningHuman-level control through deep reinforcement learning (2014)http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html
Playing Atari with Deep Reinforcement Learning (2013)http://arxiv.org/abs/1312.5602
![Page 30: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/30.jpg)
Reinforcement Learning
![Page 31: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/31.jpg)
Fun: Deep Dream
http://blogs.wsj.com/digits/2016/02/29/googles-computers-paint-like-van-gogh-and-the-art-sells-for-thousands/
![Page 32: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/32.jpg)
![Page 33: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/33.jpg)
More Fun: Neural Style
http://www.dailymail.co.uk/sciencetech/article-3214634/The-algorithm-learn-copy-artist-Neural-network-recreate-snaps-style-Van-Gogh-Picasso.html
![Page 34: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/34.jpg)
More Fun: Neural Style
http://www.boredpanda.com/inceptionism-neural-network-deep-dream-art/
![Page 35: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/35.jpg)
More Fun: Photo-realistic Synthesis
http://arxiv.org/abs/1601.04589 Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis
![Page 36: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/36.jpg)
More Fun: Neural Doodle
http://arxiv.org/abs/1603.01768 Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks
(a) Original painting by Renoir, (b) semantic annotations,(c) desired layout, (d) generated output.
![Page 37: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/37.jpg)
Text Processing / NLP
![Page 38: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/38.jpg)
Deep Learning and NLPVariety of tasks:
● Finding synonyms● Fact extraction: people and company names, geography, prices, dates,
product names, …● Classification: genre and topic detection, positive/negative sentiment
analysis, authorship detection, …● Machine translation● Search (written and spoken)● Question answering● Dialog systems● Language modeling, Part of speech recognition
![Page 39: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/39.jpg)
https://code.google.com/archive/p/word2vec/
Example: Semantic Spaces (word2vec, GloVe)
![Page 40: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/40.jpg)
http://nlp.stanford.edu/projects/glove/
Example: Semantic Spaces (word2vec, GloVe)
![Page 41: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/41.jpg)
Encoding semanticsUsing word2vec instead of word indexes allows you to better deal with the word meanings (e.g. no need to enumerate all synonyms because their vectors are already close to each other).
But the naive way to work with word2vec vectors still gives you a “bag of words” model, where phrases “The man killed the tiger” and “The tiger killed the man” are equal.
Need models which pay attention to the word ordering: paragraph2vec, sentence embeddings (using RNN/LSTM), even World2Vec (LeCunn @CVPR2015).
![Page 42: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/42.jpg)
Multi-modal learning
http://arxiv.org/abs/1411.2539 Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
![Page 43: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/43.jpg)
Example: More multi-modal learning
![Page 44: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/44.jpg)
![Page 45: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/45.jpg)
Case: Sentiment analysis
http://nlp.stanford.edu/sentiment/
Can capture complex cases where bag-of-words models fail.
“This movie was actually neither that funny, nor super witty.”
![Page 46: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/46.jpg)
Case: Machine Translation
Sequence to Sequence Learning with Neural Networks, http://arxiv.org/abs/1409.3215
![Page 47: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/47.jpg)
Case: Automated Speech TranslationTranslating voice calls and video calls in 7 languages and instant messages in over 50.
https://www.skype.com/en/features/skype-translator/
![Page 48: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/48.jpg)
Case: Baidu Automated Speech Recognition (ASR)
![Page 49: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/49.jpg)
More Fun: MtG cards
http://www.escapistmagazine.com/articles/view/scienceandtech/14276-Magic-The-Gathering-Cards-Made-by-Artificial-Intelligence
![Page 50: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/50.jpg)
Case: Question Answering
A Neural Network for Factoid Question Answering over Paragraphs, https://cs.umd.edu/~miyyer/qblearn/
![Page 51: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/51.jpg)
Case: Dialogue Systems
A Neural Conversational Model,Oriol Vinyals, Quoc Lehttp://arxiv.org/abs/1506.05869
![Page 52: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/52.jpg)
What for: Conversational Commerce
https://medium.com/chris-messina/2016-will-be-the-year-of-conversational-commerce-1586e85e3991
![Page 53: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/53.jpg)
What for: Conversational Commerce
![Page 54: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/54.jpg)
Summary
![Page 55: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/55.jpg)
Why Deep Learning is helpful? Or even a game-changer● Works on raw data (pixels, sound, text or chars), no need to feature
engineering○ Some features are really hard to develop (requires years of work for
group of experts)○ Some features are patented (i.e. SIFT, SURF for images)
● Allows end-to-end learning (pixels-to-category, sound to sentence, English sentence to Chinese sentence, etc)○ No need to do segmentation, etc. (a lot of manual labor)
⇒ You can iterate faster (and get superior quality at the same time!)
![Page 56: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/56.jpg)
Still some issues exist● No dataset -- no deep learning
There are a lot of data available (and it’s required for deep learning, otherwise simple models could be better)
○ But sometimes you have no dataset…■ Nonetheless some hacks available: Transfer learning, Data
augmentation, Mechanical Turk, …
● Requires a lot of computations.
No cluster or GPU machines -- much more time required
![Page 57: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/57.jpg)
So what to do next?
![Page 58: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/58.jpg)
Universal Libraries and Frameworks
● Torch7 (http://torch.ch/) ● TensorFlow (https://www.tensorflow.org/) ● Theano (http://deeplearning.net/software/theano/)
○ Keras (http://keras.io/) ○ Lasagne (https://github.com/Lasagne/Lasagne)○ blocks (https://github.com/mila-udem/blocks)○ pylearn2 (https://github.com/lisa-lab/pylearn2)
● CNTK (http://www.cntk.ai/) ● Neon (http://neon.nervanasys.com/) ● Deeplearning4j (http://deeplearning4j.org/) ● Google Prediction API (https://cloud.google.com/prediction/) ● …● http://deeplearning.net/software_links/
![Page 59: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/59.jpg)
Libraries & Frameworks for image/video processing
● OpenCV (http://opencv.org/) ● Caffe (http://caffe.berkeleyvision.org/) ● Torch7 (http://torch.ch/) ● clarifai (http://clarif.ai/) ● Google Vision API (https://cloud.google.com/vision/) ● … ● + all universal libraries
![Page 60: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/60.jpg)
Libraries & Frameworks for speech
● CNTK (http://www.cntk.ai/) ● KALDI (http://kaldi-asr.org/) ● Google Speech API (https://cloud.google.com/) ● Yandex SpeechKit (https://tech.yandex.ru/speechkit/) ● Baidu Speech API (http://www.baidu.com/) ● wit.ai (https://wit.ai/) ● …
![Page 61: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/61.jpg)
Libraries & Frameworks for text processing
● Torch7 (http://torch.ch/) ● Theano/Keras/… ● TensorFlow (https://www.tensorflow.org/) ● MetaMind (https://www.metamind.io/)● Google Translate API (https://cloud.google.com/translate/) ● …● + all universal libraries
![Page 62: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/62.jpg)
What to read and where to study?- CS231n: Convolutional Neural Networks for Visual Recognition, Fei-Fei
Li, Andrej Karpathy, Stanford (http://vision.stanford.edu/teaching/cs231n/index.html)
- CS224d: Deep Learning for Natural Language Processing, Richard Socher, Stanford (http://cs224d.stanford.edu/index.html)
- Neural Networks for Machine Learning, Geoffrey Hinton (https://www.coursera.org/course/neuralnets)
- Computer Vision course collection(http://eclass.cc/courselists/111_computer_vision_and_navigation)
- Deep learning course collection(http://eclass.cc/courselists/117_deep_learning)
- Book “Deep Learning”, Ian Goodfellow, Yoshua Bengio and Aaron Courville(http://www.deeplearningbook.org/)
![Page 63: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/63.jpg)
What to read and where to study?- Google+ Deep Learning community (https://plus.google.
com/communities/112866381580457264725) - VK Deep Learning community (http://vk.com/deeplearning) - Quora (https://www.quora.com/topic/Deep-Learning) - FB Deep Learning Moscow (https://www.facebook.
com/groups/1505369016451458/)- Twitter Deep Learning Hub (https://twitter.com/DeepLearningHub)- NVidia blog (https://devblogs.nvidia.com/parallelforall/tag/deep-learning/)- IEEE Spectrum blog (http://spectrum.ieee.org/blog/cars-that-think) - http://deeplearning.net/ - Arxiv Sanity Preserver http://www.arxiv-sanity.com/ - ...
![Page 64: Deep Learning Cases: Text and Image Processing](https://reader034.vdocuments.us/reader034/viewer/2022042706/5871860f1a28ab2c198b4e97/html5/thumbnails/64.jpg)
Whom to follow?- Jürgen Schmidhuber (http://people.idsia.ch/~juergen/) - Geoffrey E. Hinton (http://www.cs.toronto.edu/~hinton/)- Google DeepMind (http://deepmind.com/) - Yann LeCun (http://yann.lecun.com, https://www.facebook.com/yann.lecun) - Yoshua Bengio (http://www.iro.umontreal.ca/~bengioy, https://www.quora.
com/profile/Yoshua-Bengio)- Andrej Karpathy (http://karpathy.github.io/) - Andrew Ng (http://www.andrewng.org/)- ...