introduction to tensor flow for optical character recognition (ocr)

Tensor Flow for Optical Character Recognition

Devfest 2017

Hello!I Am Vincenzo

Santopietro

You can contact me at linkedin.com/in/vincenzosantopietro

1Machine Learning and Deep Learning

Machine learning refers to the use of algorithms to parse data, process and learn from it, in order to make predictions or determinations about something.

One of the best application for machine learning is computer vision: OCR, object tracking, object recognition etc.

MACHINE LEARNING

MACHINE LEARNING AND DEEP LEARNING

Deep learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks (ANNs)

Compared to older ML algorithms, Deep Learning performs better with a large amount of data

DEEP LEARNING


SUPERVISED VS UNSUPERVISED

Our data are not labeled. Unsupervised algorithms consider confidence measures among samples in order to create homogeneous clusters.

Most famous technique: Clustering (k-means, hierarchical etc.)

UNSUPERVISED LEARNING

All data has been labeled (supervised) by an expert. Thanks to this labeling process, we can help the network to realise the difference between classes (even though sometimes this does not happen).

Some techniques: NNs, SVM, etc.

SUPERVISED LEARNING

CLASSIFICATION VS PREDICTION

Prediction refers to the problem of estimating the behaviour of a phenomenon by analysing the “previous history”

i.e: object tracking, forecasting etc.

PREDICTION

Given an input observation, classification is the problem of identifying to which of a set of categories (classes) the new observation belongs. i.e: traffic signals recognition, emotion recognition etc.

CLASSIFICATION

TRAINING A LOGISTIC CLASSIFIER

The logistic classifier is based on the formula on the right, where X represents the input data matrix, W is the weights matrix, b contains bias terms and y is the output of the classifier.

The goal is to tune the values of W and b in order to have the lowest loss value

WEIGHTS AND BIAS TERMS

Weights bias terms

Training

Softmax is a function of the logits that takes a vector of scores and transforms it into probabilities.

SOFTMAX

MEASURING THE LOSS

Given an input sample, it’s possible to estimate the distance between the output of the classifier and the groundtruth value.

CROSS ENTROPY

THE LOSS FUNCTION

MEASURING THE LOSS

We measure the loss of the training process by computing the previous formula over the entire training set.The loss depends on W and b seen before.We want to minimise the average cross-entropy.

We’ll use Gradient Descent to minimise the loss function.

GRADIENT DESCENT

MINIMISING THE LOSS

Tensor Flow

TensorFlow is an open-source library for numerical computation and machine learning.

Its basic principle is simple: you build in Python a graph of computation to perorm and then TensorFlow runs it efficiently using optimized C++ code.

TensorFlow supports computation across multiple CPUs and GPUs

How does it work?

TENSOR FLOW’S GRAPHS

Software that uses TensorFlow is often divided into two phases: graph building and execution

In order to evaluate this graph we must run the session and all its initialisers


Running a simple graph


x = tf.Variable(3,name=”x”)y = tf.Variable(4,name=”y”)f = x*x + 2*y + 5

sess = tf.Session()sess.run(x.initializer)sess.run(y.initializer)res = sess.run(f)print res

Software that uses TensorFlow is often divided into two phases: graph building and execution

In order to evaluate this graph we must run the session and all its initialisers


Running a simple graph


x = tf.Variable(3,name=”x”)y = tf.Variable(4,name=”y”)f = x*x + 2*y + 5

with tf.Session() as session:x.initializer.run()y.initializer.run()result = f.eval()

When you evaluate a node, TensorFlow determines the set of nodes that it depends on and evaluates these nodes first.

NB: TensorFlow won’t reuse pre-computed values unless you do this ->

Node values

LIFECYCLE OF A NODE VALUE

w = tf.constant(3)x = w + 2y = x + 5z = x * 3

with tf.Session() as session:print(y.eval()) print(z.eval())

with tf.Session() as session:y_val, z_val = sess.run([y, z])print(y_val)print(z_val)

Placeholder nodes don’t perform any computation. They just output the data you’ll tell them to output at runtime.

This kind of nodes are useful for batched learning

When creating a placeholder node, you have to specify its size: None means any size.

Placeholder nodes

FEEDING DATA TO THE TRAININING ALGORITHM

TensorFlow lets you save your model at regular intervals because the training process might last for hours, days or even weeks.

All you need to do is to call the save method from a saver object.

If you want to restore the model, you have to call the restore method instead.

Checkpoints

SAVING/RESTORING MODELS

saver = tf.train.Saver()

. . .

with tf.Session() as session:. . .for epoch in range(n_epochs):

if epoch % 100 == 0:save_path =

saver.save(session,”/tmp/my_model.ckpt”)save_path = saver.save(session,”/tmp/my_model.ckpt”)

with tf.Session() as session:saver.restore(session, “/tmp/my_model.ckpt”)

. . .

““ A person who never made a mistake never tried anything

new ”

TIME TO CODE

introduction to tensor flow for optical character recognition (ocr)

Data & Analytics