deep learning at nmc devin jones
TRANSCRIPT
Devin Jones
● Machine Learning & Statistics○ Research
■ Classification■ Inference■ Time Series
○ Application■ Large scale■ Streaming
Introduction
● Columbia University○ CS/ML
● Rutgers University○ Statistics○ Econ○ Operations
Research● Ad Tech (7 years)
“”
Used to build larger audiences from smaller audience segments to create reach for advertisers.
In theory, they reflect similar characteristics to a benchmark set of characteristics the original audience segment
represents, such as in-market kitchen-appliance shoppers.
adage.com
The ML Challenge at NMC
Look Alike Modeling
Supervised What?
Machine Learning has two main categories:
Supervised Learning
Unsupervised Learning
Supervised What?
Machine Learning has two main categories:
Supervised Learning: Inferences on Labeled Data
Unsupervised Learning: Inferences on Unlabeled Data
Supervised vs Unsupervised Learning
Supervised:
Spam or Ham?
Unsupervised:
Clustering Wikipedia Articles
The quality of data for a model will influence the model’s success
At NMC, we have access to high dimensional, sparse data:
The Feature Set & Scale
Models are trained in batches of 100,000 to 100,000,000 users depending on the purpose
~4,000 Segments ~200 Publishers User Agent Geographic Info (zip code)+ + +
Resulting in over 100k features to choose from
To date, we have implemented these algorithms in our real time scoring engine:
We score billions of events per day using these models and our ML infrastructure
ML Algorithms at NMC
Binary Linear Model
kNN
Multinomial Linear Models
Online Learning for Linear Models
Random Forest
And of course…Deep Learning
21 Recent Success in Deep Learning
NMC data is similar to Natural Language Processing (NLP) data
Certain ad targeting problems can be framed as expressive, hierarchical relationships
MOTIVATION
3
Deep Learning: Recent Success
▪AlphaGo defeats all world top professional Go players
▪ Image and Speech recognition exceed human abilities
▪AI in consumer products: Amazon EchoGoogle HomeAutonomous Driving
All of these recent AI breakthroughs are based on Deep Neural Networks!
NMC Data & NLP Data
NLP data:
Observation: [‘This’, ‘is’, ‘a’, ‘tokenized’, ‘feature’, ‘vector’, ‘used’, ‘for’, ‘machine’, ‘learning’, ‘in’, ‘NLP’]
NMC data:
User: [ ‘segment: Likes Outdoors’, ‘segment: Male 25-35’, ‘location: New York, NY’]
Neural Network: Neuron
Lives in NYC? = Yes
Orders from Dominos?
Works in ad tech?
0.5 =
0.01=
0.7 =
= 1.2= No
= Yes
Definition Summary
● Training
● Inference
○ Matrix Multiplication
● Nodes
● Layers
● Network
● Features
DNN Architecture
Image Processing :: Convolutional Networks
Speech Recognition :: Recurrent Networks
AlphaGo :: Reinforcement Learning
Figure 2. Convergence of neural network model with forward shortcut (Residual Net)
Figure 1. Convergence of neural network model without forward shortcut(regular net)
Residual Network Convergence
Category Segment
City Prosperity
World-Class Health
Uptown Elite
Penthouse Chic
Metro High-Flyers
Prestige Positions
Premium Fortunes
Diamond Days
Alpha Families
Bank of Mum and Dad
Empty-Nest Adventure
Multi-level Hierarchical Classification
We are not batching matrix algebra operationsNMC Serving operates on 1 request at a time!!
GPU vs CPU
WEAK CONNECTIONSmost connections in deep neural network are very weak and can be removedTRIMMINGLOW ACCURACY IMPACTthe trimming has very little impact on the accuracy
COMPRESSED DATAthe trimming models can be described by sparse matrices, and thus the data in models are highly compressed
Model Model File Size (MB)
Trimming Threshold
Accuracy Scoring Time (ms)
Not trimmed 108 0.0 13.29 10.0
Trimmed 2.7 0.001 13.30 0.22
Trimming: Space, Time & Performance
inference improvement, in CPU time and storage50x
Key Takeaways
Architecture:● Residual Networks saved the day● Leverage expressive power of DNN for your data
Inference:● You might not need a GPU for Deep Learning● Improvements can be made on Sparse Matrix Algebra
libraries● Use trimming