deep learning at nmc devin jones

Deep Learning at NMC

Devin JonesDirector Machine Learning Lab, Nielsen

Devin Jones

● Machine Learning & Statistics○ Research

■ Classification■ Inference■ Time Series

○ Application■ Large scale■ Streaming

Introduction

● Columbia University○ CS/ML

● Rutgers University○ Statistics○ Econ○ Operations

Research● Ad Tech (7 years)

ML at NMC

Intro to Deep Learning

Deep Learning Research at NMC

Agenda

Machine Learning at NMC

“”

Used to build larger audiences from smaller audience segments to create reach for advertisers.

In theory, they reflect similar characteristics to a benchmark set of characteristics the original audience segment

represents, such as in-market kitchen-appliance shoppers.

adage.com

The ML Challenge at NMC

Look Alike Modeling

The ML Challenge at NMC: An Example

Supervised Classification for Online Ad Targeting

ML at NMC

Data

Algorithm

Model

Supervised What?

Machine Learning has two main categories:

Supervised Learning

Unsupervised Learning

Supervised What?

Machine Learning has two main categories:

Supervised Learning: Inferences on Labeled Data

Unsupervised Learning: Inferences on Unlabeled Data

Supervised vs Unsupervised Learning

Supervised:

Spam or Ham?

Unsupervised:

Clustering Wikipedia Articles

NMC high-level architecture

Machine Learning

The quality of data for a model will influence the model’s success

At NMC, we have access to high dimensional, sparse data:

The Feature Set & Scale

Models are trained in batches of 100,000 to 100,000,000 users depending on the purpose

~4,000 Segments ~200 Publishers User Agent Geographic Info (zip code)+ + +

Resulting in over 100k features to choose from

To date, we have implemented these algorithms in our real time scoring engine:

We score billions of events per day using these models and our ML infrastructure

ML Algorithms at NMC

Binary Linear Model

kNN

Multinomial Linear Models

Online Learning for Linear Models

Random Forest

And of course…Deep Learning

Deep Learning at NMC

Topics

Motivation Intro to DL NN Architecture

GPU vs CPU

Motivation

21 Recent Success in Deep Learning

NMC data is similar to Natural Language Processing (NLP) data

Certain ad targeting problems can be framed as expressive, hierarchical relationships

MOTIVATION

3

Deep Learning: Recent Success

▪AlphaGo defeats all world top professional Go players

▪ Image and Speech recognition exceed human abilities

▪AI in consumer products: Amazon EchoGoogle HomeAutonomous Driving

All of these recent AI breakthroughs are based on Deep Neural Networks!

NMC Data & NLP Data

NLP data:

Observation: [‘This’, ‘is’, ‘a’, ‘tokenized’, ‘feature’, ‘vector’, ‘used’, ‘for’, ‘machine’, ‘learning’, ‘in’, ‘NLP’]

NMC data:

User: [ ‘segment: Likes Outdoors’, ‘segment: Male 25-35’, ‘location: New York, NY’]

Deep Learning: Some definitions

Neural Network

Input

Hidden

Output

Neural Network: Neuron

Neural Network: Neuron

Lives in NYC? = Yes

Orders from Dominos?

Works in ad tech?

0.5 =

0.01=

0.7 =

= 1.2= No

= Yes

A Deep Neural Network

Neural Network Phases

Training Inference


Machine Learning


Machine Learning (Training)

(Inference)

Definition Summary

● Training

● Inference

○ Matrix Multiplication

● Nodes

● Layers

● Network

● Features

DNN Architecture

DNN Architecture

Image Processing :: Convolutional Networks

Speech Recognition :: Recurrent Networks

AlphaGo :: Reinforcement Learning

An Architecture Example: Conv Nets

A Fully Connected DNN

A Residual Network

Figure 2. Convergence of neural network model with forward shortcut (Residual Net)

Figure 1. Convergence of neural network model without forward shortcut(regular net)

Residual Network Convergence

DNN ArchitectureFor Structured Data

Category Segment

City Prosperity

World-Class Health

Uptown Elite

Penthouse Chic

Metro High-Flyers

Prestige Positions

Premium Fortunes

Diamond Days

Alpha Families

Bank of Mum and Dad

Empty-Nest Adventure

Multi-level Hierarchical Classification

C1 C2 C3

S1 S3 S4 S5 S6S2


C1 C2 C3

S1 S3 S4 S5 S6S2

Naive Approach


DNN For Multi-Level Hierarchical Classification

GPU vs. CPU

Batch Size & Processing Time

We are not batching matrix algebra operationsNMC Serving operates on 1 request at a time!!

GPU vs CPU

CPU Computational Improvements

Inference on a Layer ~ Matrix Multiplication

Input

Hidden

Sparse Matrix Multiplication

inference improvement, sub-millisecond model evaluation

32x

Trimming

WEAK CONNECTIONSmost connections in deep neural network are very weak and can be removedTRIMMINGLOW ACCURACY IMPACTthe trimming has very little impact on the accuracy

COMPRESSED DATAthe trimming models can be described by sparse matrices, and thus the data in models are highly compressed

Neural Network Without Trimming

Neural Network Trimming

Model Model File Size (MB)

Trimming Threshold

Accuracy Scoring Time (ms)

Not trimmed 108 0.0 13.29 10.0

Trimmed 2.7 0.001 13.30 0.22

Trimming: Space, Time & Performance

inference improvement, in CPU time and storage50x

Key Takeaways

Architecture:● Residual Networks saved the day● Leverage expressive power of DNN for your data

Inference:● You might not need a GPU for Deep Learning● Improvements can be made on Sparse Matrix Algebra

libraries● Use trimming

Thanks!

deep learning at nmc devin jones

Technology