deep learning behind prisma
TRANSCRIPT
![Page 1: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/1.jpg)
Deep Learning behind Prisma ——Image style transfer with Convolutional Neural Network
lostleaf
![Page 2: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/2.jpg)
Agenda• Introduce deep learning models for image style transfer via recent
papers • Prisma is kind of a stunt, but it should have used similar techniques
• Agenda
• A brief introduction to convolutional neural network
• Neural style
• Real-Time Style Transfer
![Page 3: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/3.jpg)
Prisma
• An Russian mobile app
• Turns your photos into awesome artworks
• With Deep Learning!!!
Hotel Ukraine rendered by Prisma from Premier Medvedev’s Instagram
![Page 4: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/4.jpg)
Image Style Transfer
+Arch Starry Night (van Gogh)
Arch painted by van Gogh
![Page 5: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/5.jpg)
A brief introduction to convolutional neural network
Some of the images are from Prof. Feifei Li’s lecture notes
![Page 6: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/6.jpg)
Neuron
• w: weight, b: bias
![Page 7: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/7.jpg)
Activation function(common ones)
Thresholding, preferred in modern network structures
Slower: exponentials Harder to train: vanishing gradient
Activation function: nonlinear functions
![Page 8: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/8.jpg)
Fully connected neural network
![Page 9: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/9.jpg)
Convolution• The brown numbers in the
yellow part is called conventional kernel / filter
• Convolve the filter with the image: slide over the image spatially, computing dot products
• Right: A 3*3 convolution sums up the diagonals
From Prof. Andrew Ng’s UFLDL tutorial
![Page 10: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/10.jpg)
Convolutional layerFilters always extend the full
depth of the input volume
Why *3? 3 channels: R, G & B
![Page 11: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/11.jpg)
Convolutional layer
1 number: the result of taking a dot product between the filter and a small 5*5*3 chunk of the image
![Page 12: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/12.jpg)
Convolutional layer
Transform with activation function f
f
![Page 13: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/13.jpg)
Convolutional layer• A convolutional layer
consists of several filters
• For example, if we had 6 5*5 filters, we’ll get 6 separate activation maps
• Stack these up to get a tensor of size 28*28*6
• May add padding to obtain same output size
![Page 14: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/14.jpg)
Why convolution?• Each value could be considered as an
output of a neuron
• Features of image data:
• pixels only related to small neighborhood (local connection)
• repeat pattern & content move around (weight sharing)
• Reduces the complexity and computation of neural network by utilizing natures of images 6
![Page 15: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/15.jpg)
Pooling Layer• Right: max pooling for example
• Operate independently on every depth slice of the input
• Reduce the reduce the spatial size of activation map (reduce amount of parameters and computation)
• Increase the shift invariance
![Page 16: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/16.jpg)
Case study1: MNIST & Lenet
• MNIST handwritten digits recognition
• “hello world” of deep learning
![Page 17: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/17.jpg)
Lenet
LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
pooling pooling
![Page 18: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/18.jpg)
Case study2: ImageNet & VggNet
• ImageNet: a large image dataset in thousands of classes
![Page 19: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/19.jpg)
VggNet(Vgg19)
Image by Mark Chang
• Runner-up of Imagenet challenge 2014
• 19 trainable layers • 16 convolutional layers (3*3)
• 5 max pooling layers (2*2) • 3 fully connected layers
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
![Page 20: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/20.jpg)
Typical architecture
• Convolutional part & Fully connected part
• [(CONV-RELU)*N-POOL?]*M-(FC-RELU)*K,SOFTMAX
![Page 21: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/21.jpg)
Neural style
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "A neural algorithm of artistic style." arXiv preprint arXiv:1508.06576 (2015).
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "Image style transfer using convolutional neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
![Page 22: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/22.jpg)
Intuition
• Convolutional neural networks well trained on large datasets (VggNet) could be a powerful feature extractor, like human brains
• Human painters are talented in combining content and style
![Page 23: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/23.jpg)
Goal• Given a content image p and a style image a
• Find an image x that
• Similar to p in content
• Similar to a in style
≈ ? ≈p a
x
![Page 24: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/24.jpg)
Formulation• Use Vgg19(Convolutional part) for feature extraction
• Two loss function
• Content loss: difference in content between x and p
• Style loss: difference in style between x and a
• Find an image x that minimize the weighted sum between content and style loss
![Page 25: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/25.jpg)
How to find x
Image by Mark Chang
![Page 26: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/26.jpg)
Some results
J.M.W. Turner
Vincent van Gogh Edvard Munch
![Page 27: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/27.jpg)
con’d
Pablo Picasso Wassily Kandinsky
![Page 28: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/28.jpg)
Balance content and style
• Weights of content and style: hyper parameter
• Search multiple combinations to satisfy personal aesthetic
![Page 29: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/29.jpg)
Photorealistic style transfer
New York London
![Page 30: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/30.jpg)
Drawbacks
• Iterative optimization
• Slow: 65s to render the 600 * 400 arch image with GTX 980M
• Power consuming: not acceptable for mobile apps like Prisma
![Page 31: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/31.jpg)
Real-Time Style Transfer
![Page 32: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/32.jpg)
Intuition
• Style transfer is essentially a image transformation problem: image in, image out
• Generative CNN’s are proved to be powerful in many other image transformation problems
![Page 33: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/33.jpg)
Goal
• For a specific style image a, train a CNN that
• Accepts a content image p as input
• Outputs a synthesized image x has content similar to p and style similar to a
![Page 34: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/34.jpg)
Generative CNN
• Pre trained VggNet for formulating the loss function • Style target: a fixed style image, e.g. starry night • Input image & content target: images sampled from a large dataset • Image Transform Net: fully convolutional network (and some fancy new staffs)
Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. "Perceptual losses for real-time style transfer and super-resolution." arXiv preprint arXiv:1603.08155 (2016).
![Page 35: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/35.jpg)
Details & Improvements
• Image size 256 * 256
• Trained on a large image dataset for 4h with GTX Titan X
• 200 ~ 1000X rendering speedup
![Page 36: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/36.jpg)
Some results
![Page 37: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/37.jpg)
Con’d
![Page 38: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/38.jpg)
Comparison
• Original neural style: hundreds of optimization iterations
• Generative CNN: tens of thousands of training iterations, one forward pass for synthesize
• Prisma's offline mode probably uses similar technologies
![Page 39: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/39.jpg)
Parallel work — Texture Network
Ulyanov, Dmitry, et al. "Texture Networks: Feed-forward Synthesis of Textures and Stylized Images." arXiv preprint arXiv:1603.03417 (2016).
![Page 40: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/40.jpg)
Take home• What make up a CNN
• Convolution, pooling, fully connected layer...
• How neural style works
• CNN for feature extraction & iterative optimization
• Fast style transfer
• Train a generative CNN for a specific style
![Page 41: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/41.jpg)
![Page 42: Deep Learning behind Prisma](https://reader034.vdocuments.us/reader034/viewer/2022050614/5887933d1a28ab5b1a8b569f/html5/thumbnails/42.jpg)
Some open course resources• Introduction to Computer Vision, Udacity
• Deep Learning, Udacity
• Convolutional Neural Networks for Visual Recognition, Stanford CS231n *
• Deep Learning for Natural Language Processing, Stanford CS224d