small data classification for nlp

32
Small Data Classification for Natural Language Processing Michael Thorne Head of Data Science, CaliberMind

Upload: calibermind

Post on 13-Apr-2017

56 views

Category:

Marketing


0 download

TRANSCRIPT

Page 1: Small Data Classification for NLP

Small Data Classification for Natural Language ProcessingMichael ThorneHead of Data Science, CaliberMind

Page 2: Small Data Classification for NLP

2 | ©2016 CaliberMind

Goals

• Intro

• What Makes NLP Different

• Solutions

• Questions

Page 3: Small Data Classification for NLP

3 | ©2016 CaliberMind

Michael Thorne

Head of Data Science, Caliber Mind

MS Data Science Program, GalvanizeU

B.S. Physics, Fordham University

NSA Analytic Lead

US Navy Digital Network Intelligence Analyst / Cryptolinguist

Obligatory Speaker Bio

Page 4: Small Data Classification for NLP

4 | ©2016 CaliberMind

CaliberMind

• B2B marketing SaaS

• Persona modeling and personality insights

• Content matching across buyer journey for high-value, complex purchase decisions

• Our core competency is natural language processing

Page 5: Small Data Classification for NLP

5 | ©2016 CaliberMind

What’s So Special About NLP?

• Not random (Zipf’s Law)

• Huge feature space

• Subjective Criteria

Page 6: Small Data Classification for NLP

6 | ©2016 CaliberMind

Small Data NLP

Page 7: Small Data Classification for NLP

7 | ©2016 CaliberMind

Persona Status Quo

• Assumptive Personas

• Qualitative Criteria

• Subjective Labels

• Static Output

Page 8: Small Data Classification for NLP

8 | ©2016 CaliberMind

Starting Point

Demographics

Psychographics

Firmographics

Page 9: Small Data Classification for NLP

9 | ©2016 CaliberMind

Let’s Validate the Status Quo

Page 10: Small Data Classification for NLP

10 | ©2016 CaliberMind

CaliberMind’s Data Challenge

• We match the right message, to the right person, at the right time

• We operate at the upper limits of human scale problems (100’s - 10,000’s of documents)

• We weren’t getting as accurate results as we expected

Page 11: Small Data Classification for NLP

11 | ©2016 CaliberMind

Our Friend: The Central Limit Theorem

• This is the theorem that lets us assume our data is well behaved, assuming we have enough of it

• Let’s look at a classic example, coin tosses

Page 12: Small Data Classification for NLP

12 | ©2016 CaliberMind

Coin Flip Distribution

Page 13: Small Data Classification for NLP

13 | ©2016 CaliberMind

1 Trial

Page 14: Small Data Classification for NLP

14 | ©2016 CaliberMind

100 Trials

Page 15: Small Data Classification for NLP

15 | ©2016 CaliberMind

Example: K-Means

• K-means is a workhorse algorithm when doing unsupervised learning

• What are the assumptions we make when we use k-means?

Spherical data

Same variance

Same prior probability

Turns out NLP data is none of these things

Page 16: Small Data Classification for NLP

16 | ©2016 CaliberMind

Happy K-Means

Page 17: Small Data Classification for NLP

17 | ©2016 CaliberMind

NLP K-Means

Page 18: Small Data Classification for NLP

18 | ©2016 CaliberMind

But Wait, It Gets Better

• Our documents tend to be of vastly different sizes within the same corpus

• Unbalanced Classes

• Qualitative Criteria

• Unlabeled data

• Human-labeling is time intensive

Page 19: Small Data Classification for NLP

Our Solution

Page 20: Small Data Classification for NLP

20 | ©2016 CaliberMind

Dimensionality Reduction

• Dimensionality was the first thing we tackled

• Manual dictionaries to collapse similar terms• mark = [‘growth hacker’, ‘marketer’, ‘demand gen’]

• LSA to remove low-information terms

• Automating the process using word2vec, dbpedia, and skip gram similarities

• As we aggregate more data, we’re able to do this process more effectively

Page 21: Small Data Classification for NLP

21 | ©2016 CaliberMind

Spiky Data

Page 22: Small Data Classification for NLP

22 | ©2016 CaliberMind

Metrics Over Raw Scores

• Especially important when comparing data of different sizes

• How many standard deviations off the mean works better than a simple similarity score

• Pick the best similarity score (with NLP, it’s not cosine)

Page 23: Small Data Classification for NLP

23 | ©2016 CaliberMind

Pretend We Have Labeled Data

• Rules-based scoring algorithm for a first pass

• Take a small subset of high-scoring people as exemplars

• Use a latent semantic analysis of these exemplars to make a template

• Compare remaining data rows against each exemplar cluster

• Assign highest score to that exemplar cluster, broadening the definition

• Continue until all data rows are assigned

• Any row with a similarity below a threshold we set is labeled as an ‘Unknown’, indicates additional, underlying personas

Page 24: Small Data Classification for NLP

24 | ©2016 CaliberMind

Round 1 (Rules)

Name Title Similarity Score Persona

Luke J VP Marketing 1.0 Value

Randy P Founder

Lucas M Growth Ninja

Bec G Tech Guru

Fiona F Sysops 1.0 Security

Claude S Growth Hacker

Art L Data Analyst

Page 25: Small Data Classification for NLP

25 | ©2016 CaliberMind

Round 2 (LSA)

Name Title Similarity Score Persona

Luke J VP Marketing 1.0 Value

Randy P Founder 0.45

Lucas M Growth Ninja 0.11

Bec G Tech Guru 0.71 Security

Fiona F Sysops 1.0 Security

Claude S Growth Hacker 0.87 Value

Art L Data Analyst 0.41

Page 26: Small Data Classification for NLP

26 | ©2016 CaliberMind

Round 3 (LSA)

Name Title Similarity Score Persona

Luke J VP Marketing 1.0 Value

Randy P Founder 0.68 Security

Lucas M Growth Ninja 0.18

Bec G Tech Guru 0.86 Security

Fiona F Sysops 1.0 Security

Claude S Growth Hacker 0.89 Value

Art L Data Analyst 0.72 Security

Page 27: Small Data Classification for NLP

27 | ©2016 CaliberMind

Round 3 (LSA)

Name Title Similarity Score Persona

Luke J VP Marketing 1.0 Value

Randy P Founder 0.71 Security

Lucas M Growth Ninja 0.16 Unknown

Bec G Tech Guru 0.88 Security

Fiona F Sysops 1.0 Security

Claude S Growth Hacker 0.91 Value

Art L Data Analyst 0.78 Security

Page 28: Small Data Classification for NLP

28 | ©2016 CaliberMind

Example

Page 29: Small Data Classification for NLP

29 | ©2016 CaliberMind

Example

Page 30: Small Data Classification for NLP

30 | ©2016 CaliberMind

Example

Page 31: Small Data Classification for NLP

31 | ©2016 CaliberMind

Takeaways

• Human-generated data is never really random

• Small data models are hyper-sensitive

• Validate assumptions