chris fregly, research scientist, pipelineio at mlconf atl 2016
TRANSCRIPT
![Page 1: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/1.jpg)
MLconf ATL!Sept 23rd, 2016
Chris FreglyResearch Scientist @ PipelineIO
![Page 2: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/2.jpg)
Who am I?
Chris Fregly, Research Scientist @ PipelineIO, San Francisco
Previously, Engineer @ Netflix, Databricks, and IBM Spark
Contributor @ Apache Spark, Committer @ Netflix OSS
Founder @ Advanced Spark and TensorFlow Meetup
Author @ Advanced Spark (advancedspark.com)
![Page 3: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/3.jpg)
Advanced Spark and Tensorflow Meetup
![Page 4: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/4.jpg)
ATL Spark Meetup (9/22)
http://www.slideshare.net/cfregly/atlanta-spark-user-meetup-09-22-2016
![Page 5: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/5.jpg)
ATL Hadoop Meetup (9/21)
http://www.slideshare.net/cfregly/atlanta-hadoop-users-meetup-09-21-2016
![Page 6: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/6.jpg)
![Page 7: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/7.jpg)
Confession #1
I Failed Linguistics in College!Chose Pass/Fail Option
(90 (mid-term) + 70 (final)) / 2 = 80 = C+How did a C+ turn into an F?
ZER0 (0) CLASS PARTICIPATION?!
![Page 8: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/8.jpg)
Confession #2
I Hated Statistics in College
2 Degrees: Mechanical + Manufacturing EnggApproximations were Bad!
I Wasn’t a Fluffy Physics MajorThough, I Kinda Wish I Was!
![Page 9: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/9.jpg)
Wait… Please Don’t Leave!I’m Older and Wiser Now
Approximate is the New Exact
Computational Linguistics and NLP are My Jam!
![Page 10: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/10.jpg)
Agenda
Tensorflow + Neural Nets
NLP Fundamentals
NLP Models
![Page 11: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/11.jpg)
What is Tensorflow?General Purpose Numerical Computation Engine
Happens to be good for neural nets!
ToolingTensorboard (port 6006 == `goog`) à
DAG-based like Spark!Computation graph is logical plan
Stored in Protobuf’s
TF converts logical -> physical plan
Lots of LibrariesTFLearn (Tensorflow’s Scikit-learn Impl)
Tensorflow Serving (Prediction Layer) à ^^
Distributed and GPU-Optimized
![Page 12: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/12.jpg)
What are Neural Networks?Like All ML, Goal is to Minimize Loss (Error)
Error relative to known outcome of labeled data
Mostly Supervised Learning ClassificationLabeled training data
Training StepsStep 1: Randomly Guess Input Weights
Step 2: Calculate Error Against Labeled Data
Step 3: Determine Gradient Value, +/- Direction
Step 4: Back-propagate Gradient to Update Each Input Weight
Step 5: Repeat Step 1 with New Weights until Convergence
ActivationFunction
![Page 13: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/13.jpg)
Activation FunctionsGoal: Learn and Train a Model on Input Data
Non-Linear Functions Find Non-Linear Fit of Input Data
Common Activation FunctionsSigmoid Function (sigmoid)
{0, 1}Hyperbolic Tangent (tanh)
{-1, 1}
![Page 14: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/14.jpg)
Back Propagation
http://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html
Gradients Calculated by Comparing to Known Label
Use Gradients to Adjust Input Weights
Chain Rule
![Page 15: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/15.jpg)
Loss/Error OptimizersGradient Descent
Batch (entire dataset)Per-record (don’t do this!)Mini-batch (empirically 16 -> 512)Stochastic (approximation)Momentum (optimization)
AdaGradSGD with adaptive learning rates per featureSet initial learning rateMore-likely to incorrectly converge on local minima
http://www.slideshare.net/cfregly/gradient-descent-back-propagation-and-auto-differentiation-advanced-spark-and-tensorflow-meetup-08042016
![Page 16: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/16.jpg)
The MathLinear Algebra
Matrix MultiplicationVery Parallelizable
CalculusDerivativesChain Rule
![Page 17: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/17.jpg)
Convolutional Neural NetworksFeed-forward
Do not form a cycle
Apply Many Layers (aka. Filters) to Input
Each Layer/Filter Picks up on FeaturesFeatures not necessarily human-grokkable
Examples of Human-grokkable Filters3 color filters: RGBMoving AVG for time series
Brute ForceTry Diff numLayers & layerSizes
![Page 18: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/18.jpg)
CNN Use Case: Stitch Fix
Stitch Fix Also Uses NLP to Analyze Return/Reject Comments
StitchFix Strata Conf SF 2016:Using Deep Learning to Create New Clothing Styles!
![Page 19: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/19.jpg)
Recurrent Neural NetworksForms a Cycle (vs. Feed-forward)
Maintains State over TimeKeep track of context
Learns sequential patterns
Decay over time
Use CasesSpeech
Text/NLP Prediction
![Page 20: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/20.jpg)
RNN Sequences
Input: ImageOutput: Classification
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Input: ImageOutput: Text (Captions)
Input: TextOutput: Class (Sentiment)
Input: Text (English)Output: Text (Spanish)
InputLayer
HiddenLayer
OutputLayer
![Page 21: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/21.jpg)
Character-based RNNsTokens are Characters vs. Words/Phrases
Microsoft trains ever 3 characters
Less Combination of Possible NeighborsOnly 26 alpha character tokens vs. millions of word tokens
Preserves state between
1st and 2nd ‘l’improves prediction
![Page 22: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/22.jpg)
Long Short Term Memory (LSTM)
More ComplexState Update
Functionthan
Vanilla RNN
![Page 23: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/23.jpg)
LSTM State Update
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Cell State
Forget Gate Layer(Sigmoid)
Input Gate Layer(Sigmoid)
Candidate Gate Layer(tanh)
OutputLayer
![Page 24: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/24.jpg)
Transfer Learning
![Page 25: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/25.jpg)
Agenda
Tensorflow + Neural Nets
NLP Fundamentals
NLP Models
![Page 26: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/26.jpg)
Use CasesDocument Summary
TextRank: TF/IDF + PageRank
Article Classification and SimilarityLDA: calculate top `k` topic distribution
Machine Translationword2vec: compare word embedding vectors
Must Convert Text to Numbers!
![Page 27: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/27.jpg)
Core ConceptsCorpus
Collection of text ie. Documents, articles, genetic codes
EmbeddingsTokens represented/embedded in vector spaceLearned, hidden features (~PCA, SVD)Similar tokens cluster together, analogies cluster apart
k-skip-gramSkip k neighbors when defining tokens
n-gramTreat n consecutive tokens as a single token
Composable:1-skip, bi-gram(every other word)
![Page 28: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/28.jpg)
Parsers and POS Taggers
Describe grammatical sentence structure
Requires context of entire sentence
Helps reason about sentence
80% obvious, simple token neighbors
Major bottleneck in NLP pipeline!
![Page 29: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/29.jpg)
Pre-trained Parsers and TaggersPenn Treebank
Parser and Part-of-Speech TaggerHuman-annotated (!)Trained on 4.5 million words
Parsey McParsefaceTrained by SyntaxNet
![Page 30: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/30.jpg)
Feature EngineeringLower-case
Preserve proper nouns using carat (`^`)“MLconf ” => “^m^lconf ”“Varsity” => “^varsity”
Encode Common N-grams (Phrases)Create a single token using underscore (`_`)“Senior Developer” => “senior_developer”
Stemming and LemmatizationTry to avoid: let the neural network figure this outCan preserve part of speech (POS) using “_noun”, “_verb”“banking” => “banking_verb”
![Page 31: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/31.jpg)
Agenda
Tensorflow + Neural Nets
NLP Fundamentals
NLP Models
![Page 32: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/32.jpg)
Count-based ModelsGoal: Convert Text to Vector of Neighbor Co-occurrences
Bag of Words (BOW)Simple hashmap with word countsLoses neighbor context
Term Frequency / Inverse Document Frequency (TF/IDF)Normalizes based on token frequency
GloVeMatrix factorization on co-occurrence matrix
Highly parallelizable, reduce dimensions, capture global co-occurrence statsLog smoothing of probability ratios
Stores word vector diffs for fast analogy lookups
![Page 33: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/33.jpg)
Neural-based Predictive ModelsGoal: Predict Text using Learned Embedding Vectors
word2vecShallow neural networkLocal: nearby words predict each otherFixed word embedding vector size (ie. 300)Optimizer: Mini-batch Stochastic Gradient Descent (SGD)
SyntaxNetDeep(er) neural networkGlobal(er)Not a Recurrent Neural Net (RNN)!Can combine with BOW-based models (ie. word2vec CBOW)
![Page 34: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/34.jpg)
word2vec
CBOW word2vecPredict target word from source contextA single source context is an observationLoses useful distribution informationGood for small datasets
Skip-gram word2vec (Inverse of CBOW)Predict source context words from target wordEach (source context, target word) tuple is observationBetter for large datasets
![Page 35: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/35.jpg)
word2vec Libraries
gensimPython onlyMost popular
Spark MLPython + Java/Scala Supports only synonyms
![Page 36: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/36.jpg)
*2vec
lda2vecLDA (global) + word2vec (local)From Chris Moody @ Stitch Fix
like2vecEmbedding-based Recommender
![Page 37: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/37.jpg)
word2vec vs. GloVeBoth are Fundamentally Similar
Capture local co-occurrence statistics (neighbors)Capture distance between embedding vector (analogies)
GloVeCount-basedAlso captures global co-occurrence statisticsRequires upfront pass through entire dataset
![Page 38: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/38.jpg)
SyntaxNet POS TaggingDetermine coarse-grained grammatical role of each wordMultiple contexts, multiple roles
Neural Net Inputs: stack, buffer
Results: POS probability distro
Already Tagged
![Page 39: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/39.jpg)
SyntaxNet Dependency ParserDetermine fine-grained roles using grammatical relationships“Transition-based”, Incremental Dependency Parser
Globally Normalized using Beam Search with Early Update
Parsey McParseface: Pre-trained Parser/Tagger avail in 40 langs
Fine-grained
Coarse-grained
![Page 40: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/40.jpg)
SyntaxNet Use Case: NutritionNutrition and Health Startup in SF (Stealth)
Using Google’s SyntaxNet
Rate Recipes and Menus by Nutritional Value
Correct
Incorrect
![Page 41: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/41.jpg)
Model ValidationUnsupervised Learning Requires Validation
Google has Published Analogy Tests for Model Validation
Thanks, Google!
![Page 42: Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016](https://reader031.vdocuments.us/reader031/viewer/2022030301/587fb16d1a28ab107e8b595f/html5/thumbnails/42.jpg)
Thank You, Atlanta!Chris Fregly, Research Scientist @ PipelineIO
All Source Code, Demos, and Docker Images @ pipeline.io
Join the Global Meetup for all Slides and Videos@ advancedspark.com