deep learning pipeline978-1-4842-5349... · 2019-12-20 · deep learning pipeline building a deep...

Deep Learning PipelineBuilding a Deep Learning

Model with TensorFlow

Hisham El-AmirMahmoud Hamdy

Deep Learning Pipeline: Building a Deep Learning Model with TensorFlow

ISBN-13 (pbk): 978-1-4842-5348-9 ISBN-13 (electronic): 978-1-4842-5349-6https://doi.org/10.1007/978-1-4842-5349-6

Copyright © 2020 by Hisham El-Amir and Mahmoud Hamdy

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Managing Director, Apress Media LLC: Welmoed SpahrAcquisitions Editor: Aaron BlackDevelopment Editor: James MarkhamCoordinating Editor: Jessica Vakili

Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

For information on translations, please e-mail [email protected], or visit www.apress.com/rights-permissions.

Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Print and eBook Bulk Sales web page at www.apress.com/bulk-sales.

Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the book’s product page, located at www.apress.com/978-1-4842-5348-9. For more detailed information, please visit www.apress.com/source-code.

Printed on acid-free paper

Hisham El-AmirJizah, Egypt

Mahmoud HamdyJizah, Egypt

https://doi.org/10.1007/978-1-4842-5349-6

iii

Part I: Introduction ��1

Chapter 1: A Gentle Introduction ��3

Information Theory, Probability Theory, and Decision Theory ��4

Information Theory ��4

Probability Theory ��6

Decision Theory ��8

Introduction to Machine Learning ��10

Predictive Analytics and Its Connection with Machine learning ��11

Machine Learning Approaches ��12

From Machine Learning to Deep Learning ��19

Lets’ See What Some Heroes of Machine Learning Say About the Field ��19

Connections Between Machine Learning and Deep Learning ��20

Difference Between ML and DL ��21

Why Should We Learn About Deep Learning (Advantages of Deep learning)? ��23

Disadvantages of Deep Learning (Cost of Greatness) ��24

Introduction to Deep Learning ��25

Machine Learning Mathematical Notations ��28

Summary��36

Table of Contents

About the Authors ��xv

About the Technical Reviewer ��xvii

Introduction ��xix

https://doi.org/10.1007/978-1-4842-5349-6_1

https://doi.org/10.1007/978-1-4842-5349-6_1

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec13

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec14

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec15

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec16

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec20

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec20

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec21

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec22

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec23

https://doi.org/10.1007/978-1-4842-5349-6_1#Sec24

iv

Chapter 2: Setting Up Your Environment ��37

Background ��37

Python 2 vs� Python 3 ��38

Installing Python ��38

Python Packages ��40

IPython ��41

Jupyter ��43

Packages Used in the Book ��50

NumPy ��50

SciPy ��50

Pandas ��51

Matplotlib ��51

NLTK ��52

Scikit-learn ��52

Gensim ��53

TensorFlow ��53

Keras ��56

Summary��56

Chapter 3: A Tour Through the Deep Learning Pipeline ��57

Deep Learning Approaches ��58

What Is Deep Learning ��58

Biological Deep Learning ��58

What Are Neural Networks Architectures? ��62

Deep Learning Pipeline ��68

Define and Prepare Problem��69

Summarize and Understand Data ��70

Process and Prepare Data ��71

Table of ConTenTsTable of ConTenTs

https://doi.org/10.1007/978-1-4842-5349-6_2

https://doi.org/10.1007/978-1-4842-5349-6_2

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec11

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec12

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec13

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec14

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec15

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec16

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec17

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec18

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec21

https://doi.org/10.1007/978-1-4842-5349-6_2#Sec22

https://doi.org/10.1007/978-1-4842-5349-6_3

https://doi.org/10.1007/978-1-4842-5349-6_3

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec8

v

Evaluate Algorithms ��72

Improve Results ��73

Fast Preview of the TensorFlow Pipeline ��74

Tensors—the Main Data Structure ��75

First Session ��76

Data Flow Graphs ��78

Tensor Properties��81

Summary��83

Chapter 4: Build Your First Toy TensorFlow app ��85

Basic Development of TensorFlow ��85

Hello World with TensorFlow ��86

Simple Iterations ��87

Prepare the Input Data ��88

Doing the Gradients ��91

Linear Regression ��93

Why Linear Regression? ��93

What Is Linear Regression? ��93

Dataset Description ��94

Full Source Code ��99

XOR Implementation Using TensorFlow ��101

Full Source Code ��107

Summary��109

Part II: Data ��111

Chapter 5: Defining Data ��113

Defining Data ��114

Why Should You Read This Chapter?��114

Structured, Semistructured, and Unstructured Data ��115


https://doi.org/10.1007/978-1-4842-5349-6_3#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec11

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec12

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec13

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec14

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec15

https://doi.org/10.1007/978-1-4842-5349-6_3#Sec18

https://doi.org/10.1007/978-1-4842-5349-6_4

https://doi.org/10.1007/978-1-4842-5349-6_4

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec11

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec12

https://doi.org/10.1007/978-1-4842-5349-6_4#Sec13

https://doi.org/10.1007/978-1-4842-5349-6_5

https://doi.org/10.1007/978-1-4842-5349-6_5

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec3

vi

Tidy Data ��117

Divide and Conquer ��118

Tabular Data ��119

Quantitative vs� Qualitative Data ��119

Example—the Titanic ��119

Divide and Conquer ��121

Making a Checkpoint ��122

The Four Levels of Data ��122

The Nominal Level ��123

The Ordinal Level ��125

Quick Recap and Check ��129

The Interval Level ��130

Examples of Interval Level Data ��130

What Data Is Like at the Interval Level ��131

Mathematical Operations Allowed for Interval ��131

The Ratio Level ��134

Summarizing All Levels Table 5-1 ��136

Text Data ��137

What Is Text Processing and What Is the Level of Importance of Text Processing? ��137

IMDB—Example ��138

Images Data ��139

Summary��144

Chapter 6: Data Wrangling and Preprocessing ��147

The Data Fields Pipelines Revisited ��148

Giving You a Reason ��148

Where Is Data Cleaning in the Process? ��149


https://doi.org/10.1007/978-1-4842-5349-6_5#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec11

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec13

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec17

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec22

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec23

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec24

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec25

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec26

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec30

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec34

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec35

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec36

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec36

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec37

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec38

https://doi.org/10.1007/978-1-4842-5349-6_5#Sec45

https://doi.org/10.1007/978-1-4842-5349-6_6

https://doi.org/10.1007/978-1-4842-5349-6_6

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec3

vii

Data Loading and Preprocessing ��150

Fast and Easy Data Loading ��150

Missing Data ��158

Empties ��159

Is It Ever Useful to Fill Missing Data Using a Zero Instead of an Empty or Null? ��159

Managing Missing Features ��160

Dealing with Big Datasets ��161

Accessing Other Data Formats ��163

Data Preprocessing ��164

Data Augmentation ��169

Image Crop ��172

Crop and Resize ��172

Crop to Bounding Box ��174

Flipping ��175

Rotate Image ��177

Translation ��178

Transform ��179

Adding Salt and Pepper Noise ��180

Convert RGB to Grayscale ��181

Change Brightness ��181

Adjust Contrast ��182

Adjust Hue ��183

Adjust Saturation ��184

Categorical and Text data ��185

Data Encoding ��186

Performing One-Hot Encoding on Nominal Features ��188

Can You Spot the Problem? ��189


https://doi.org/10.1007/978-1-4842-5349-6_6#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec11

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec12

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec13

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec14

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec15

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec16

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec17

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec18

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec19

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec20

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec21

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec22

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec23

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec24

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec25

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec26

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec27

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec28

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec29

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec30

viii

A Special Type of Data: Text ��190

So Far, Everything Has Been Pretty Good, Hasn’t It? ��195

Tokenization, Stemming, and Stop Words ��201

Summary��206

Chapter 7: Data Resampling ��207

Creating Training and Test Sets ��208

Cross-Validation ��209

Validation Set Technique ��210

Leave-One-Out Cross-Validation (LOOCV)��213

K-Fold Cross-Validation ��216

Bootstrap��217

Bootstrap in Statistics ��218

Tips to Use Bootstrap (Resampling with Replacement) ��220

Generators ��223

What Are Keras Generators?��223

Data Generator ��225

Callback ��226

Summary��231

Chapter 8: Feature Selection and Feature Engineering ��233

Dataset Used in This Chapter ��234

Dimensionality Reduction—Questions to Answer ��236

What Is Dimensionality Reduction? ��237

When Should I Use Dimensionality Reduction? ��239

Unsupervised Dimensionality Reduction via Principal Component Analysis (PCA) ��240

Total and Explained Variance ��243

Feature Selection and Filtering ��243


https://doi.org/10.1007/978-1-4842-5349-6_6#Sec31

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec32

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec33

https://doi.org/10.1007/978-1-4842-5349-6_6#Sec37

https://doi.org/10.1007/978-1-4842-5349-6_7

https://doi.org/10.1007/978-1-4842-5349-6_7

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec11

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec12

https://doi.org/10.1007/978-1-4842-5349-6_7#Sec13

https://doi.org/10.1007/978-1-4842-5349-6_8

https://doi.org/10.1007/978-1-4842-5349-6_8

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec7

ix

Principal Component Analysis ��247

Nonnegative Matrix Factorization ��256

Sparse PCA ��257

Kernel PCA ��259

Atom Extraction and Dictionary Learning ��261

Latent Dirichlet Allocation (LDA)��262

Latent Dirichlet Allocation (LDA in NLP) ��263

Code Example Using gensim ��267

LDA vs� PCA ��269

ZCA Whitening ��272

Summary��276

Part III: TensorFlow ��277

Chapter 9: Deep Learning Fundamentals ��279

Perceptron ��280

Single Perceptron ��290

Multilayer Perceptron ��291

Recap ��292

Different Neural Network Layers ��293

Input Layer ��294

Hidden Layer(s) ��294

Output Layer ��295

Shallow vs� Deep Neural Networks ��295

Activation Functions ��297

Types of Activation Functions ��299

Recap ��305

Gradient Descent��305

Recap ��307


https://doi.org/10.1007/978-1-4842-5349-6_8#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec11

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec12

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec13

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec14

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec15

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec16

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec17

https://doi.org/10.1007/978-1-4842-5349-6_8#Sec18

https://doi.org/10.1007/978-1-4842-5349-6_9

https://doi.org/10.1007/978-1-4842-5349-6_9

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec11

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec12

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec13

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec14

x

Batch vs� Stochastic vs� Mini-Batch Gradient Descent ��308

Batch Gradient Descent ��308

Stochastic Gradient Descent ��309

Mini-batch Gradient Descent ��310

Recap ��311

Loss function and Backpropagation ��312

Loss Function ��316

Backpropagation ��319

Exploding Gradients ��330

Re-Design the Network Model ��332

Use Long Short-Term Memory Networks ��332

Use Gradient Clipping ��332

Use Weight Regularization ��333

Vanishing Gradients ��333

Vanishing Gradients Problem ��334

TensorFlow Basics ��336

Placeholder vs� Variable vs� Constant ��337

Gradient-Descent Optimization Methods from a Deep-Learning Perspective ��338

Learning Rate in the Mini-batch Approach to Stochastic Gradient Descent ��343

Summary��343

Chapter 10: Improving Deep Neural Networks ��345

Optimizers in TensorFlow ��345

The Notation to Use ��346

Momentum ��347

Nesterov Accelerated Gradient ��348

Adagrad ��349


https://doi.org/10.1007/978-1-4842-5349-6_9#Sec15

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec16

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec17

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec18

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec19

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec20

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec21

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec22

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec24

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec25

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec26

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec27

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec28

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec29

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec30

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec31

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec32

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec33

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec33

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec34

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec34

https://doi.org/10.1007/978-1-4842-5349-6_9#Sec35

https://doi.org/10.1007/978-1-4842-5349-6_10

https://doi.org/10.1007/978-1-4842-5349-6_10

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec5

xi

Adadelta ��350

RMSprop ��351

Adam ��352

Nadam (Adam + NAG) ��353

Choosing the Learning Rate ��354

Dropout Layers and Regularization ��357

Normalization Techniques ��359

Batch Normalization ��360

Weight Normalization ��361

Layer Normalization ��362

Instance Normalization ��363

Group Normalization ��364

Summary��365

Chapter 11: Convolutional Neural Network ��367

What is a Convolutional Neural Network ��368

Convolution Operation ��369

One-Dimensional Convolution ��369

Two-Dimensional Convolution ��371

Padding and Stride ��372

Common Image-Processing Filters ��375

Mean and Median Filters ��375

Gaussian Filter ��382

Sobel Edge-Detection Filter ��385

Identity Transform ��390

Convolutional Neural Networks ��390


https://doi.org/10.1007/978-1-4842-5349-6_10#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec11

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec12

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec13

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec14

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec15

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec16

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec17

https://doi.org/10.1007/978-1-4842-5349-6_10#Sec18

https://doi.org/10.1007/978-1-4842-5349-6_11

https://doi.org/10.1007/978-1-4842-5349-6_11

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec11

xii

Layers of Convolutional Neural Networks ��391

Input Layer ��393

Convolutional layer ��393

Pooling Layer ��396

Backpropagation Through the Convolutional and Pooling Layers ��397

Weight Sharing Through Convolution and Its Advantages��399

Translation Equivariance and Invariance ��400

Case Study—Digit Recognition on the CIFAR-10 Dataset ��403

Summary��413

Chapter 12: Sequential Models ��415

Recurrent Neural Networks��415

Language Modeling ��420

Backpropagation Through Time ��424

Vanishing and Exploding Gradient Problems in RNN ��429

The Solution to Vanishing and Exploding Gradients Problems in RNNs ��432

Long Short-Term Memory ��433

Case Study—Digit Identification on the MNIST Dataset ��438

Gated Recurrent Unit ��438

Bidirectional RNN (Bi-RNN) ��445

Summary��446

Part IV: Applying What You’ve Learned ��447

Chapter 13: Selected Topics in Computer Vision ��449

Different Architectures in Convolutional Neural Networks ��450

LeNet ��451

AlexNet ��453

VGG ��456

ResNet ��458


https://doi.org/10.1007/978-1-4842-5349-6_11#Sec12

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec13

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec14

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec15

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec16

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec17

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec18

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec19

https://doi.org/10.1007/978-1-4842-5349-6_11#Sec20

https://doi.org/10.1007/978-1-4842-5349-6_12

https://doi.org/10.1007/978-1-4842-5349-6_12

https://doi.org/10.1007/978-1-4842-5349-6_12#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_12#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_12#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_12#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_12#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_12#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_12#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_12#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_12#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_12#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_13

https://doi.org/10.1007/978-1-4842-5349-6_13

https://doi.org/10.1007/978-1-4842-5349-6_13#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_13#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_13#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_13#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_13#Sec5

xiii

Transfer Learning ��460

What Is a Pretrained Model, and Why Use It? ��461

How to Use a Pretrained Model? ��462

Ways to Fine-Tune the Model ��463

Pretrained VGG19 ��464

Summary��469

Chapter 14: Selected Topics in Natural Language Processing ��471

Vector Space Model ��472

Vector Representation of Words ��475

Word2Vec ��476

Continuous Bag of Words ��476

Skip-Gram Model for Word Embeddings ��486

GloVe ��492

Summary��494

Chapter 15: Applications ��495

Case Study—Tabular Dataset ��495

Understanding the Dataset ��495

Preprocessing Dataset ��505

Building the Model��510

Case Study—IMDB Movie Review Data with Word2Vec ��515

Case Study—Image Segmentation ��525

Summary��535

Index ��537


https://doi.org/10.1007/978-1-4842-5349-6_13#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_13#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_13#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_13#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_13#Sec10

https://doi.org/10.1007/978-1-4842-5349-6_13#Sec11

https://doi.org/10.1007/978-1-4842-5349-6_14

https://doi.org/10.1007/978-1-4842-5349-6_14

https://doi.org/10.1007/978-1-4842-5349-6_14#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_14#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_14#Sec3

https://doi.org/10.1007/978-1-4842-5349-6_14#Sec4

https://doi.org/10.1007/978-1-4842-5349-6_14#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_14#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_14#Sec9

https://doi.org/10.1007/978-1-4842-5349-6_15

https://doi.org/10.1007/978-1-4842-5349-6_15

https://doi.org/10.1007/978-1-4842-5349-6_15#Sec1

https://doi.org/10.1007/978-1-4842-5349-6_15#Sec2

https://doi.org/10.1007/978-1-4842-5349-6_15#Sec5

https://doi.org/10.1007/978-1-4842-5349-6_15#Sec6

https://doi.org/10.1007/978-1-4842-5349-6_15#Sec7

https://doi.org/10.1007/978-1-4842-5349-6_15#Sec8

https://doi.org/10.1007/978-1-4842-5349-6_15#Sec9

xv

About the Authors

Hisham El-Amir is a data scientist with expertise in machine learning,

deep learning, and statistics. He currently lives and works in Cairo, Egypt.

In his work projects, he faces challenges ranging from natural language

processing (NLP), behavioral analysis, and machine learning to distributed

processing. He is very passionate about his job and always tries to stay

updated about the latest developments in data science technologies,

attending meetups, conferences, and other events.

Mahmoud Hamdy is a machine learning engineer who works and lives

in Egypt. His primary area of study is the overlap between knowledge,

logic, language, and learning. He works helping to train machine learning

and deep learning models to distil large amounts of unstructured,

semistructured, and structured data into new knowledge about the world

by using methods ranging from deep learning to statistical relational

learning. He applies strong theoretical and practical skills in several areas

of machine learning, to find novel and effective solutions for interesting

and challenging problems in such interconnections.

xvii

About the Technical Reviewer

Vishwesh Ravi Shrimali graduated in 2018 from BITS Pilani, where

he studied mechanical engineering. Since then, he has been working

with BigVision LLC on deep learning and computer vision, and is also

involved in creating official OpenCV courses. He has a keen interest

in programming and AI, and has applied that interest in mechanical

engineering projects. He has also written multiple blogs on OpenCV and

deep learning on LearnOpenCV, a leading blog on computer vision. He has

also co-authored Machine Learning for OpenCV4 (2nd edition). When he

is not writing blogs or working on projects, he likes to go on long walks or

play his acoustic guitar.

xix

Artificial intelligence (AI) is the field of embeddings human thinking into

computers: in other words, creating an artificial brain that mimics the

functions of the biological brain. Whatever the human can do intelligently

is now required to be moved into machines. First-generation AI focuses

on problems that can be formally described by humans. Using AI, steps

for doing something intelligent are described in a form of instructions

that machines follow. Machines follow humans without changes. These

features are characteristic of the first era of AI.

Humans can fully describe only simple problems such as chess,

and fail to describe more complicated problems. In chess, the problem

can be simply explained by representing the board as a matrix of size

8×8, describing each piece and how it moves and describing the goals.

Machines will be restricted to those tasks formally described by humans.

By programming such instructions, machines can play chess intelligently.

Machine intelligence is now artificial. The machine itself is not intelligent,

but humans have transferred their intelligence to the machine in the

form of several static lines of code. “Static” means that the behavior is the

same in all cases. The machine, in this case, is tied to the human and can’t

work on its own. This is like a master–slave relationship. The human is the

master and the machines are the slaves, which just follow the human’s

orders and no more.

To make the machine able to recognize objects, we can give it previous

knowledge from experts in a way the machine can understand. Such

knowledge-based systems form the second era of AI. One of the challenges

in such systems is how to handle uncertainty and unknowns. Humans

Introduction

xx

can recognize objects even in different and complex environments, and

are able to handle uncertainty and unknowns intelligently, but machines

can’t.

The GoalDeep learning is a branch of machine learning where you model the world

in terms of a hierarchy of concepts. This pattern of learning is similar to

the way a human brain learns, and it allows computers to model complex

concepts that often go unnoticed in other traditional methods of modeling.

Hence, in the modern computing paradigm, deep learning plays a vital

role in modeling complex real-world problems, especially by leveraging

the massive amount of unstructured data available today.

Because of the complexities involved in a deep learning model,

many times it is treated as a black box by people using it. However, to

derive the maximum benefit from this branch of machine learning,

one needs to uncover the hidden mystery by looking at the science and

mathematics associated with it. In this book, great care has been taken

to explain the concepts and techniques associated with deep learning

from a mathematical as well as a scientific viewpoint. Also, the first

chapter is totally dedicated to building the mathematical base required

to comprehend deep learning concepts with ease. TensorFlow has been

chosen as the deep learning package because of its flexibility for research

purposes and its ease of use. Another reason for choosing TensorFlow is its

capability to load models with ease in a live production environment using

its serving capabilities.

In summary, Deep Learning Pipeline should provide practical expertise

so you can learn deep learning pipeline from scratch in such a way that

you can deploy meaningful deep learning solutions. This book will

allow you to get up to speed quickly using TensorFlow and to optimize

different deep learning architectures. All the practical aspects of deep

InTroduCTIonInTroduCTIon

xxi

learning that are relevant in any industry are emphasized in this book.

You will be able to use the prototypes demonstrated to build new deep

learning applications. The code presented in the book is available in the

form of iPython notebooks and scripts that allow you to try out examples

and extend them in interesting ways. You will be equipped with the

mathematical foundation and scientific knowledge to pursue research in

this field and give back to the community.

All code in the book is implemented using Python. Because native

Python is complex for handling images, multiple libraries are used to

help to produce an efficient implementation for applications across the

chapters.

Who This Book Is ForThis book is for data scientists and machine learning professionals looking

at deep learning solutions to solve complex business problems, software

developers working on deep learning solutions through TensorFlow, and

graduate students and open source enthusiasts with a constant desire to

learn.

Prerequisites Python and all the deep learning tools mentioned in the book, from

IPython to TensorFlow to model that you will use, are free of charge

and can be freely downloaded from the Internet. To run the code that

accompanies the book, you need a computer that uses a Windows, Linux,

or Mac OS operating system. The book will introduce you step-by-step to

the process of installing the Python interpreter and all the tools and data

that you need to run the examples.


xxii

How this Book Is Organized Parts

• Part I: Introduction—In this part, we prepare the

readers by giving them all the prerequisites needed

to start the journey with machine learning to deep

learning.

• Part II: Data—As the first step of the pipeline, readers

need to know everything about data, from data

collection and understanding information from data to

data processing and preparation.

• Part III: TensorFlow—In this part, we start the

interesting stuff. First, we illustrate the fundamental

and important concepts of deep learning; then we deep

dive into the core of neural networks and the types of

neural networks, describing each type; and show the

important concepts of the equation of deep learning.

Also, we can’t forget to show a real-life example of each

type.

• Part IV: Applying What You’ve Learned—This part

is designed to ensure readers practice by using

TensorFlow and build the pipeline.

Chapters• Chapter 1: A gentle introduction—This chapter provides

the big picture that shows readers what is the field that

the book describes; introduction to this field; and the

mathematical equations and notations that describe

how machine learning works.


https://doi.org/10.1007/978-1-4842-5349-6_1

xxiii

• Chapter 2: Setting Up Your Environment—This chapter

introduces the programming tools and packages

you need in this book and some theories to help in

understanding; it also includes a bit of introduction to

the Python programming language.

• Chapter 3: A Nice Tour Through the Deep Learning

Pipeline—In chapter 3 we introduce the pipeline that

the whole book is for; the deep learning approaches

and subfields; the steps of the deep learning pipeline;

and the extras added to TensorFlow that make it unique

compared with other deep learning frameworks.

• Chapter 4: Build Your First Toy TensorFlow App—To

make sure that we will not drop readers in the middle

of the book, we show them a small example using

TensorFlow that will go fast at each step of the deep

learning pipeline; and make sure that the audience

knows each step of the pipeline, how it is important,

and how to use it.

• Chapter 5: Defining Data—This chapter, as its name

implies, is about defining data. Readers should know

what type of data they are dealing with, and that’s very

important so they can choose the right approach for

preparing the data.

• Chapter 6: Data Wrangling and Preprocessing—After

understanding the data, the readers now should choose

the approaches and methodologies for preparing it, so

this chapter helps ensure that the readers will choose

the right approaches in this step.


https://doi.org/10.1007/978-1-4842-5349-6_2

https://doi.org/10.1007/978-1-4842-5349-6_3

https://doi.org/10.1007/978-1-4842-5349-6_3

https://doi.org/10.1007/978-1-4842-5349-6_4

https://doi.org/10.1007/978-1-4842-5349-6_5

https://doi.org/10.1007/978-1-4842-5349-6_6

xxiv

• Chapter 7: Data Resampling—After cleaning and

preparing the dataset, now the reader should know

how to sample this dataset in the right way. Choosing

the wrong samples from your data may influence the

result of your models, so in this chapter we illustrate

all techniques and approaches needed to sample your

dataset in the right way.

• Chapter 8: Feature Selection and Feature Engineering—

In this chapter we describe a very important topic

in data step of the pipeline: feature selection and

engineering. Readers should know how to select and

choose the important input feature that contributes

most to the output feature in which they are interested.

Feature engineering is the process of using domain

knowledge of the data to create features that make

machine learning algorithms work. Feature selection

and engineering are fundamental to the application of

machine and deep learning, and readers should know

when and how to use them.

• Chapter 9: Deep Learning Fundamentals—In this

chapter we describe a very important topic in deep

learning fundamentals, the basic functions that deep

learning is built on. Then we try to build layers from

these functions and combine these layers together

to get a more complex model that will help us solve

more complex problems. All that will be described by

TensorFlow examples.

• Chapter 10: Improving Deep Neural Networks—In this

chapter we describe an important topic: after building

the deep learning models, the improvement starts. This

chapter concerns optimization, tuning and choosing


https://doi.org/10.1007/978-1-4842-5349-6_7

https://doi.org/10.1007/978-1-4842-5349-6_8

https://doi.org/10.1007/978-1-4842-5349-6_9

https://doi.org/10.1007/978-1-4842-5349-6_10

xxv

hyperparameter techniques, and weight normalization

and how that will make the learning process easier

and faster. After that, the reader should know how to

evaluate, optimize, and tune the model parameters to

reach the optimal solution and a satisfying accuracy.

• Chapter 11: Convolutional Neural Network—One of the

important classes of deep learning is the convolutional

neural network. In this chapter we illustrate everything

about CNN from the one-dimensional mask to the

advanced stuff like weight sharing and the difference

between equivariance and invariance. We illustrate a

case study using the famous dataset CIFAR-10.

• Chapter 12: Sequential Models—Another class of

deep learning is sequential models. In this chapter we

describe the problem of sequential data and the rise

of recurrent neural networks, the problem and also

the evolution of the GRU and LSTM, and of course we

include a case study.

• Chapter 13: Selected Topics in Computer Vision—After

finishing CNN in Part III, it’s good to add some extra

knowledge that makes it easier for readers when they work,

like using prebuilt architectures and transfer learning.

• Chapter 14: Selected Topics in Natural Language

Processing—This chapter fills the gaps that readers need

in working with text, giving readers all the advanced

approaches and techniques of natural language processing.

• Chapter 15: Applications—Here we show some case

studies to make sure that readers get the full knowledge

and understanding of how to build a pipeline, with

real-life examples.


https://doi.org/10.1007/978-1-4842-5349-6_11

https://doi.org/10.1007/978-1-4842-5349-6_12

https://doi.org/10.1007/978-1-4842-5349-6_13

https://doi.org/10.1007/978-1-4842-5349-6_14

https://doi.org/10.1007/978-1-4842-5349-6_15

deep learning pipeline978-1-4842-5349... · 2019-12-20 · deep learning pipeline building a deep...

Documents