deep learning overview and practical use in marketing and cyber-security
TRANSCRIPT
![Page 1: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/1.jpg)
Natalino BusaHead of Applied Data Science
![Page 2: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/2.jpg)
Data Scientist, Big and Fast Data ArchitectCurrently at Teradata
Previously: Enterprise Data Architect at INGSenior Researcher at Philips Research
Interests: Spark, Flink, Cassandra, Akka, Kafka, MesosAnomaly Detection, Time Series, Deep Learning
![Page 3: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/3.jpg)
Data Science: approachesSupervised:- you know what the outcome must be
Unsupervised:- you don’t know what the outcome must be
Semi-Supervised:- You know the outcome only for some samples
![Page 4: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/4.jpg)
Popularity of Neural Networks: “The cat neuron”Andrew Ng, Jeff Dean et al:
1000 Machines10 Million images
1 Billion connectionsTrain for 3 days
http://research.google.com/archive/unsupervised_icml2012.html
![Page 5: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/5.jpg)
Popularity of Neural Networks: “AI at facebook”Yann LeCunnDirector of AI research at Facebook
Ask the AI what it sees in the image
“Is there a baby?”Facebook’s AI: “Yes.”
“What is the man doing?”Facebook’s AI: “Typing.”
“Is the baby sitting on his lap?”Facebook’s AI: “Yes.”
http://www.wired.com/2015/11/heres-how-smart-facebooks-ai-has-become/
![Page 6: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/6.jpg)
Data Science: approachesSupervised:- you know what the outcome must be
Unsupervised:- you don’t know what the outcome must be
Semi-Supervised:- You know the outcome only for some samples
![Page 7: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/7.jpg)
Unsupervised Learning- Clustering, Feature extraction
Imagining, Medical data, Genetics, Crime patterns,Recommender systems, Climate hot spots analysis, anomaly detection
… Given a set of items, it answers the question “how can we efficiently describe the collection?It defines a measure of “similarity” between items.
![Page 8: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/8.jpg)
Supervised Learning- Classification
Marketing Churn, Credit Loan, Success rateInsurance Defaulting, Health conditions and patologiesCategorization of wine, real estates,
… Given the values of some properties, it answers the question “to which class/group does this item belong?”
![Page 9: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/9.jpg)
Classification: Dimensionality matters- Number of dimensions or features of your input data- Statistical relations, smoothness of the data- Embedded space
input : 784 dimensionsoutput: 10 classes
input : 4 dimensionsoutput: 3 classes
28x28 pixels
![Page 10: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/10.jpg)
AI, complexity and models
Does it do well on Training Data ?
Does it do well on Test Data ?
Bigger Neural Network(rocket engine)
More Data(rocket fuel)
yes yes
nono
Done?
Different Architecture(new rocket)
no
https://www.youtube.com/watch?v=CLDisFuDnog
![Page 11: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/11.jpg)
Evolution of Machine Learning
Input
Hand Designed Program
Rule-based System
Output
Prof. Yoshua Bengio - Deep Learninghttps://youtu.be/15h6MeikZNg
![Page 12: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/12.jpg)
Evolution of Machine Learning
Input
Hand Designed Program
Input
Rule-based System
Output
Hand Designed Features
Mapping from features
Output
Classic Machine Learning
Prof. Yoshua Bengio - Deep Learninghttps://youtu.be/15h6MeikZNg
![Page 13: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/13.jpg)
Evolution of Machine Learning
Input
Hand Designed Program
Input Input
Rule-based System
Output
Hand Designed Features
Mapping from features
Output
Learned Features
Mapping from features
Output
Classic Machine Learning
RepresentationalMachine Learning
Prof. Yoshua Bengio - Deep Learninghttps://youtu.be/15h6MeikZNg
![Page 14: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/14.jpg)
Evolution of Machine Learning
Input
Hand Designed Program
Input Input
Rule-based System
Output
Hand Designed Features
Mapping from features
Output
Learned Features
Mapping from features
Output
Classic Machine Learning
Input
Learned Features
LearnedComplex features
Output
Mapping from features
RepresentationalMachine Learning
Deep Learning
Prof. Yoshua Bengio - Deep Learninghttps://youtu.be/15h6MeikZNg
![Page 15: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/15.jpg)
“dendrites”
Axon’s response
Activation function
From Biology to a Mathematical Model
![Page 16: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/16.jpg)
Logit model: Perceptron 1 Layer Neural NetworkTakes: n-input features: Map them to a soft “binary” space
∑
x1x2
xn
f
![Page 17: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/17.jpg)
Multiple classes: Softmax From soft binary space to predicting probabilities:Take n inputs, Divide by the sum of the predicted values
∑x1x2
xn
f
∑ f softm
ax Cat: 95%
Dog: 5% Values between 0 and 1Sum of all outcomes = 1
It behaves like a probability, But it’s just an estimate!
![Page 18: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/18.jpg)
Cost function: Supervised LearningThe actual outcome is different than the desired outcome
We measure the difference!This measure can be done in various ways:
- Mean absolute error (MAE) - Mean squared error (MSE)
- Categorical Cross-EntropyCompares estimated probability vs actual probability
![Page 19: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/19.jpg)
Minimize cost: How to Learn?
The cost function depends on:
- Parameters of the model- How the model “composes”
Goal : modify the parameters to reduce the error!
Vintage math from last century
![Page 20: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/20.jpg)
Build deeper networksStack layers of perceptrons
- “Sequential Network”- Back propagate the error SOFTMAX
Input parameters
Classes (estimated probabilities)
Feed
-forw
ard
Cost function
supervised : actual output
Correct
parameters
![Page 21: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/21.jpg)
Some problems- Calculating the derivative of the Cost function
- can be error prone- Automation would be nice!
- Complex network graph = complex derivative
- Dense Layers (Fully connected)- Harder to converge
- Number of parameters grows fast!
- Overfitting and Parsimony- Learn “well”, generalization capacity- Be efficient in the number of parameters
![Page 22: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/22.jpg)
Some Solutions- Calculating the derivative of the Cost function
- Software libraries
- GPU support for computing vectorial and tensorial data
- New Layers Types - Convolution Layers 2D/3D- Dropout layer
- Fast activation functions
- Faster learning methods- Derived from Stochastic Gradient Descend (SGA)- Weight initializations with Auto-Encoders and RBM
![Page 23: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/23.jpg)
Convolutional Networks
Idea 1: reuse the weights across while scanning the imageIdea 2: subsampling results from layers to layers
![Page 24: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/24.jpg)
Fast Activation Functions
Idea: don’t use complex exponential functions, linear functions are fast to compute, and easy to differentiate !
![Page 25: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/25.jpg)
Dropout Layer, Batch Weight NormalizationDropout:Set randomly some of the input to zero.It improves generalization and makes the network function more robust to errors.
Batch Weight Normalization:Normalize the activations of the previous layer at each batch.
![Page 26: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/26.jpg)
Efficient Symbolic DifferentiationThere are good libraries which calculate the derivatives symbolically of an arbitrary number of stacked layers
● efficient symbolic differentiation ● dynamic C code generation ● transparent use of a GPU
CNTK
![Page 27: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/27.jpg)
Efficient Symbolic Differentiation (2)There are good libraries which calculate the derivatives symbolically of an arbitrary number of stacked layers
● efficient symbolic differentiation ● dynamic C code generation ● transparent use of a GPU
>>> import theano
>>> import theano.tensor as T
>>> from theano import pp
>>> x = T.dscalar('x')
>>> y = x ** 2
>>> gy = T.grad(y, x)
>>> f = theano.function([x], gy)
pp(f.maker.fgraph.outputs[0])'(2.0 * x)'
![Page 28: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/28.jpg)
Higher Abstraction Layer: Keras
Keras: Deep Learning library for Theano and TensorFlow
- Easier to stack layers- Easier to train and test- More ready-made blocks
http://keras.io/
![Page 29: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/29.jpg)
Example 1: Iris classificationCategorize Iris flowers based on
- Sepal length/width- Petal length/width
3 classes,Dataset is quite small (150 samples)
- Iris Setosa - Iris Versicolour - Iris Virginica
input : 4 dimensionsoutput: 3 classes
![Page 30: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/30.jpg)
Iris classification: Network
model = Sequential()
model.add(Dense(15, input_shape=(4,)))model.add(Activation('relu'))model.add(Dropout(0.1))
model.add(Dense(10))model.add(Activation('relu'))model.add(Dropout(0.1))
model.add(Dense(nb_classes))model.add(Activation('softmax'))
SOFTMAX
RELU
RELU
Setosa Versicolour Virginica
Dropout 10%
Dropout 10%
Train- Test split 80% - 20%Test accuracy: 96%
![Page 31: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/31.jpg)
Example 2: telecom customer marketingSemi-synthetic datasetThe "churn" data set was developed to predict telecom customer churn based on information about their account. The data files state that the data are "artificial based on claims similar to real world". These data are also contained in the C50 R package.
1 classes (churn)Dataset is quite small (about 3000 samples)
17 input dimensions:
State, account length, area code, phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,number customer service calls
![Page 32: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/32.jpg)
Churn telecom: Networkmodel = Sequential()
model.add(Dense(50, input_shape=(17,)))model.add(Activation("hard_sigmoid"))model.add(BatchNormalization())model.add(Dropout(0.1))
model.add(Dense(10))model.add(Activation("hard_sigmoid"))model.add(BatchNormalization())model.add(Dropout(0.1))
model.add(Dense(1))model.add(Activation(sigmoid))
SOFTMAX
RELU
RELU
Churn No-Churn
Dropout 10%
Dropout 10%
Train- Test split 80% - 20%Test accuracy: 82%
![Page 33: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/33.jpg)
Models: Small Data, Big Data- Not all domains have large amount of data
- Think of Clinical Tests, or Lengthy/Costly Experimentations
- Small specialized data set and Neural Networks- Good for complex non-linear separation of classes
Interesting Read:https://medium.com/@ShaliniAnanda1/an-open-letter-to-yann-lecun-22b244fc0a5a#.ngpal1ojx
![Page 34: Deep learning overview and practical use in marketing and cyber-security](https://reader031.vdocuments.us/reader031/viewer/2022030316/5876ba7f1a28abad1a8b6af7/html5/thumbnails/34.jpg)
Conclusions- Neural Networks can be used for small data as well- Other methods might be more efficient in this scenario’s
- Neural Networks are an extension to GLMs and linear regression- Learn Linear Regression, GLM, SVM as well- Random Forests and Boosted Trees are an alternative
- More data = Bigger and better Neural Networks- We have some tools to jump start analysis