intro to flair open source nlpalanakbik.github.io/talks/ml_meetup_2018.pdf · intro to flair: open...

75
Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Berlin ML Meetup, December 2018

Upload: others

Post on 19-Apr-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

Intro to Flair:Open Source NLP

Framework

Alan AkbikZalando Research

Please write title, subtitle and speaker name in all capital letters

Berlin ML Meetup, December 2018

Page 2: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

2

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

TEXT DATA IN FASHION

Page 3: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

3

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

TEXT DATA IN FASHION

[...] How I style the basics that fill my wardrobe changes

from season to season. And city to city, too, come to

think of it. In Berlin, I paired this dress with a moto

jacket and ankle boots, while in Paris, I added an

oversized hat and classic pumps. [...]www.cocoandvera.com

Page 4: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

4

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

TEXT DATA IN FASHION

[...] How I style the basics that fill my wardrobe changes

from season to season. And city to city, too, come to

think of it. In Berlin, I paired this dress with a moto

jacket and ankle boots, while in Paris, I added an

oversized hat and classic pumps. [...]www.cocoandvera.com

Page 5: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

5

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

TEXT DATA IN FASHION

[...] How I style the basics that fill my wardrobe changes

from season to season. And city to city, too, come to

think of it. In Berlin, I paired this dress with a moto

jacket and ankle boots, while in Paris, I added an

oversized hat and classic pumps. [...]www.cocoandvera.com

Page 6: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

6

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

TEXT DATA IN FASHION

[...] How I style the basics that fill my wardrobe changes

from season to season. And city to city, too, come to

think of it. In Berlin, I paired this dress with a moto

jacket and ankle boots, while in Paris, I added an

oversized hat and classic pumps. [...]www.cocoandvera.com

Page 7: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

7

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

DOCUMENT CLASSIFICATION

Fashion NER

NamedLocation

NamedPerson

NominalProduct

NamedProduct

Look

NamedEvent

[...] quick delivery as always. Thank you

very much! [...]

[...] waited for three days until the

package finally arrived! [...]

Document 1

Document 2

Task: automatically categorize your documents into one or more classes

Page 8: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

8

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

DOCUMENT CLASSIFICATION

Fashion NER

NamedLocation

NamedPerson

NominalProduct

NamedProduct

Look

NamedEvent

[...] quick delivery as always. Thank you

very much! [...]

[...] waited for three days until the

package finally arrived! [...]

DELIVERY-FAST

DELIVERY-SLOW

Document 1

Document 2

Task: automatically categorize your documents into one or more classes

Classes

Page 9: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

9

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

DOCUMENT CLASSIFICATION

Fashion NER

NamedLocation

NamedPerson

NominalProduct

NamedProduct

Look

NamedEvent

[...] quick delivery as always. Thank you

very much! [...]

[...] waited for three days until the

package finally arrived! [...]

DELIVERY-FAST

DELIVERY-SLOW

Document 1

Document 2

Task: automatically categorize your documents into one or more classes

Classes

Page 10: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

10

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

SEQUENCE LABELING

Fashion NER

NamedLocation

NamedPerson

NominalProduct

NamedProduct

Look

NamedEvent

NamedOrganizationRetailer

How I style the basics that fill my wardrobe changes from season to season.

And city to city, too, come to think of it. In Berlin, I paired this dress with a

moto jacket and ankle boots, while in Paris, I added an oversized hat and

classic pumps. For my evening shoot in downtown Winnipeg with Christa

Wong, I chose all of my current wardrobe favourites, including

Dior-inspired pumps from Zara and marble statement earrings from Olive +

Piper. [...]

Fashion Entity Types

Page 11: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

11

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

SEQUENCE LABELING

Fashion NER

NamedLocation

NamedPerson

NominalProduct

NamedProduct

Look

NamedEvent

NamedOrganizationRetailer

How I style the basics that fill my wardrobe changes from season to season.

And city to city, too, come to think of it. In Berlin, I paired this dress with a

moto jacket and ankle boots, while in Paris, I added an oversized hat and

classic pumps. For my evening shoot in downtown Winnipeg with Christa

Wong, I chose all of my current wardrobe favourites, including

Dior-inspired pumps from Zara and marble statement earrings from Olive +

Piper. [...]

Fashion Entity Types

NamedLocation

Page 12: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

12

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

SEQUENCE LABELING

Fashion NER

NamedLocation

NamedPerson

NominalProduct

NamedProduct

Look

NamedEvent

NamedOrganizationRetailer

How I style the basics that fill my wardrobe changes from season to season.

And city to city, too, come to think of it. In Berlin, I paired this dress with a

moto jacket and ankle boots, while in Paris, I added an oversized hat and

classic pumps. For my evening shoot in downtown Winnipeg with Christa

Wong, I chose all of my current wardrobe favourites, including

Dior-inspired pumps from Zara and marble statement earrings from Olive +

Piper. [...]

Fashion Entity Types

NamedLocation NominalProduct

Page 13: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

13

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

SEQUENCE LABELING

Fashion NER

NamedLocation

NamedPerson

NominalProduct

NamedProduct

Look

NamedEvent

NamedOrganizationRetailer

How I style the basics that fill my wardrobe changes from season to season.

And city to city, too, come to think of it. In Berlin, I paired this dress with a

moto jacket and ankle boots, while in Paris, I added an oversized hat and

classic pumps. For my evening shoot in downtown Winnipeg with Christa

Wong, I chose all of my current wardrobe favourites, including

Dior-inspired pumps from Zara and marble statement earrings from Olive +

Piper. [...]

Fashion Entity Types

NamedLocation NominalProduct NamedPerson

NamedOrganizationRetailer

Page 14: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

14

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

FLAIR FRAMEWORK

a very simple framework for state-of-the-art natural language processing (NLP)

Page 15: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

15

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

FLAIR FRAMEWORK

a very simple framework for state-of-the-art natural language processing (NLP)

● current state-of-the-art across many NLP tasks

Page 16: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

16

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

FLAIR FRAMEWORK

a very simple framework for state-of-the-art natural language processing (NLP)

● current state-of-the-art across many NLP tasks

● very simple to use

Page 17: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

17

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

THIS TALK

Page 18: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

18

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

THIS TALK

1. New type of word embeddings

Page 19: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

19

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

THIS TALK

1. New type of word embeddings

2. New state-of-the-art scores across sequence labeling tasks

Task Our approach Previous best

NER English 93.09 ± 0.12 92.22±0.1 (Peters et al., 2018)

NER German 88.32 ± 0.2 78.76 (Lample et al., 2016)

Chunking 96.72 ± 0.05 96.37±0.05 (Peters et al., 2017)

PoS tagging 97.85 ± 0.01 97.64 (Choi, 2016)

Page 20: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

20

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

THIS TALK

1. New type of word embeddings

2. New state-of-the-art scores across sequence labeling tasks

3. Introduce Flair framework

Task Our approach Previous best

NER English 93.09 ± 0.12 92.22±0.1 (Peters et al., 2018)

NER German 88.32 ± 0.2 78.76 (Lample et al., 2016)

Chunking 96.72 ± 0.05 96.37±0.05 (Peters et al., 2017)

PoS tagging 97.85 ± 0.01 97.64 (Choi, 2016)

Page 21: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

21

Talk Outline

Overview

Character-level neural language models

Limitations of classic word embeddings

Flair Embeddings

Comparative evaluation

Usage Example

Flair Framework

Page 22: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

22

Talk Outline

Overview

Character-level neural language models

Limitations of classic word embeddings

Flair Embeddings

Comparative evaluation

Usage Example

Flair Framework

Page 23: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

23

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Classic word embeddings learn a vector representation for each word in a fixed vocabulary

Page 24: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

24

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Problem: Words are just strings

Page 25: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

25

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Page 26: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

26

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Page 27: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

27

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Classic word embeddings learn a vector representation for each word in a fixed vocabulary

Page 28: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

28

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Classic word embeddings learn a vector representation for each word in a fixed vocabulary

Page 29: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

29

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Problem 1: Word ambiguity

Classic word embeddings learn a vector representation for each word in a fixed vocabulary

Page 30: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

30

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Problem 1: Word ambiguity

● “Washington” ○ Last name○ State / city○ Sports team○ …

Classic word embeddings learn a vector representation for each word in a fixed vocabulary

Page 31: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

31

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Problem 1: Word ambiguity

● “Washington” ○ Last name○ State / city○ Sports team○ …

● Classic word embeddings conflate all meanings into single vector

Classic word embeddings learn a vector representation for each word in a fixed vocabulary

Page 32: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

32

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Problem 1: Word ambiguity

● “Washington” ○ Last name○ State / city○ Sports team○ …

● Classic word embeddings conflate all meanings into single vector

● Contextualized embeddings?

Classic word embeddings learn a vector representation for each word in a fixed vocabulary

Page 33: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

33

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Problem 1: Word ambiguity

● “Washington” ○ Last name○ State / city○ Sports team○ …

● Classic word embeddings conflate all meanings into single vector

● Contextualized embeddings?

Problem 2: Fixed vocabulary

Classic word embeddings learn a vector representation for each word in a fixed vocabulary

Page 34: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

34

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Problem 1: Word ambiguity

● “Washington” ○ Last name○ State / city○ Sports team○ …

● Classic word embeddings conflate all meanings into single vector

● Contextualized embeddings?

Problem 2: Fixed vocabulary

● What is a word? Tokenizer decides?○ “48-year-old”○ “Hotelzimmer” (hotel room)

Classic word embeddings learn a vector representation for each word in a fixed vocabulary

Page 35: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

35

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Problem 1: Word ambiguity

● “Washington” ○ Last name○ State / city○ Sports team○ …

● Classic word embeddings conflate all meanings into single vector

● Contextualized embeddings?

Problem 2: Fixed vocabulary

● What is a word? Tokenizer decides?○ “48-year-old”○ “Hotelzimmer” (hotel room)

● Long-tailed distribution of words ○ Rare words? ○ Out of vocabulary words?○ “cooooooool”

Classic word embeddings learn a vector representation for each word in a fixed vocabulary

Page 36: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

36

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WORD EMBEDDINGS

Problem 1: Word ambiguity

● “Washington” ○ Last name○ State / city○ Sports team○ …

● Classic word embeddings conflate all meanings into single vector

● Contextualized embeddings?

Problem 2: Fixed vocabulary

● What is a word? Tokenizer decides?○ “48-year-old”○ “Hotelzimmer” (hotel room)

● Long-tailed distribution of words ○ Rare words? ○ Out of vocabulary words?○ “cooooooool”

● Meaningful embeddings for any word?

Classic word embeddings learn a vector representation for each word in a fixed vocabulary

Page 37: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

37

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

CONTEXTUAL STRING EMBEDDINGS

We propose contextual string embeddings that are:

● Contextualized by their usage in text

Page 38: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

38

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

CONTEXTUAL STRING EMBEDDINGS

We propose contextual string embeddings that are:

● Contextualized by their usage in text

● Fundamentally model words as strings of characters

Page 39: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

39

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

CONTEXTUAL STRING EMBEDDINGS

We propose contextual string embeddings that are:

● Contextualized by their usage in text

● Fundamentally model words as strings of characters

● Pre-trained on very large corpora

Page 40: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

40

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

CONTEXTUAL STRING EMBEDDINGS

We propose contextual string embeddings that are:

● Contextualized by their usage in text

● Fundamentally model words as strings of characters

● Pre-trained on very large corpora

We produce these embeddings using neural character-level language modeling

Page 41: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

41

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

NEURAL LANGUAGE MODELING

Language modeling:

● Train recurrent neural network (RNN) to predict the next word in a sequence of words

Page 42: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

42

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

NEURAL LANGUAGE MODELING

Language modeling:

● Train recurrent neural network (RNN) to predict the next word in a sequence of words

Character-level language modeling:

● Train RNN to predict the next character in a sequence of characters

● No tokenization

● Small vocabulary

Page 43: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

43

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

NEURAL LANGUAGE MODELING

because it was hungry, the cat ___

what is the next word?

Page 44: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

44

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

NEURAL LANGUAGE MODELING

because it was hungry, the cat ___ ate

what is the next word?

Page 45: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

45

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

NEURAL LANGUAGE MODELING

because it was hungry, the cat ___

because it was hungry, the cat ate ____

ate

what is the next word?

what is the next word?

Page 46: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

46

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

NEURAL LANGUAGE MODELING

because it was hungry, the cat ___

because it was hungry, the cat ate ____

because it was hungry, the cat ate the ____

ate

what is the next word?

what is the next word?

the

what is the next word?

Page 47: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

47

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

NEURAL LANGUAGE MODELING

because it was hungry, the cat ___

because it was hungry, the cat ate ____

because it was hungry, the cat ate the ____

The model learns

● Shallow syntax○ nouns, verbs, adjectives○ tense, number

● Sentence-level syntax○ constituents○ subordinate clauses ○ punctuation, capitalization

● Shallow semantics○ sentiment○ topic

ate

what is the next word?

what is the next word?

the

what is the next word?

Page 48: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

48

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

WHAT DOES THIS NEURAL LANGUAGE MODEL KNOW?

We can sample the LM to generate text:

(1) According to a giant external film crew , the visible food contained " weirdness or unknown " firestorms .

(3) Iran 's Deputy Marine Ministry inspector general last week criticised security forces for testing changes in a military base when attackers began putting metal plates in , he said .

(2) According to ADA attorney Stacy Baileil , prosecutors have requested that all of the county bend to him rather than require him to accept the legal fees .

Page 49: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

49

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

INTERNAL LM REPRESENTATIONS

Model represents syntactic and semantic properties!

(Radfort et. al, 2017)

Page 50: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

50

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

PROPOSED APPROACH

Page 51: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

51

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

PROPOSED APPROACH

Page 52: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

52

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

PROPOSED APPROACH

● Pass sentence as sequence of characters into two character-level language models

● Retrieve the internal states before first and after last character for each word

● Combine forward and backward states to form embedding

Page 53: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

53

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

TRANSFER LEARNING

Supervised task

Unsupervised task

Page 54: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

54

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

COMPARATIVE EVALUATION

Tasks:

● CoNLL-03 Named Entity Recognition for English and German

● CoNLL-2000 Chunking

● WSJ Part-of-Speech Tagging

Page 55: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

55

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

COMPARATIVE EVALUATION

Tasks:

● CoNLL-03 Named Entity Recognition for English and German

● CoNLL-2000 Chunking

● WSJ Part-of-Speech Tagging

Setup:

● BiLSTM-CRF architecture (Huang et. al, 2015)

○ Only classic word embeddings (Huang et. al, 2015)

○ Word and character embeddings (Lample et. al, 2016)

○ ELMo embeddings (Peters et. al, 2017; 2018)

Page 56: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

56

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

RESULTS

Page 57: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

57

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

EVALUATION RESULTS

Takeaways [1]:

● Combination of Contextual String Embeddings and Classic Word Embeddings consistently gives us state-of-the-art results

[1] Contextual String Embeddings for Sequence Labeling. Alan Akbik, Duncan Blythe, Roland Vollgraf. 27th International Conference on Computational Linguistics, COLING 2018.

Page 58: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

58

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

EVALUATION RESULTS

Takeaways [1]:

● Combination of Contextual String Embeddings and Classic Word Embeddings consistently gives us state-of-the-art results

● Task-trained character-level features not necessary

[1] Contextual String Embeddings for Sequence Labeling. Alan Akbik, Duncan Blythe, Roland Vollgraf. 27th International Conference on Computational Linguistics, COLING 2018.

Page 59: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

59

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

EVALUATION RESULTS

Takeaways [1]:

● Combination of Contextual String Embeddings and Classic Word Embeddings consistently gives us state-of-the-art results

● Task-trained character-level features not necessary

● Character-level LM embeddings match or outperform word-level LM embeddings (ELMo)

[1] Contextual String Embeddings for Sequence Labeling. Alan Akbik, Duncan Blythe, Roland Vollgraf. 27th International Conference on Computational Linguistics, COLING 2018.

Page 60: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

60

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

EVALUATION RESULTS

Takeaways [1]:

● Combination of Contextual String Embeddings and Classic Word Embeddings consistently gives us state-of-the-art results

● Task-trained character-level features not necessary

● Character-level LM embeddings match or outperform word-level LM embeddings (ELMo)

● Also state-of-the-art for Polish [2]

[1] Contextual String Embeddings for Sequence Labeling. Alan Akbik, Duncan Blythe, Roland Vollgraf. 27th International Conference on Computational Linguistics, COLING 2018.

[2] Approaching nested named entity recognition with parallel LSTM-CRFs. Łukasz Borchmann, Andrzej Gretkowski, Filip Graliński. Proceedings of the PolEval 2018 Workshop, PolEval 2018.

Page 61: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

61

Talk Outline

Overview

Character-level neural language models

Limitations of classic word embeddings

Contextual String Embeddings

Results of comparative evaluation

Baselines and experimental setup

Sequence Labeling Experiments

Page 62: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

62

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

OPEN SOURCE RELEASE

- a very simple framework for state-of-the-art NLP

pip install flair

Flair is:

● A Python library installable through pip

● Built on Pytorch

● Currently at version 0.3.2

Use Flair to:

● Apply our pre-trained taggers on your text

● Train your own NLP models

Page 63: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

63

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

TAG A SENTENCE

from flair.data import Sentencefrom flair.models import SequenceTagger

# make a sentencesentence = Sentence('I love Berlin .')

# load the NER taggertagger = SequenceTagger.load('ner')

# run NER over sentencetagger.predict(sentence)

print(sentence.to_tagged_string())

I love Berlin <S-LOC> .

print(sentence)

Sentence: "I love Berlin ." - 4 Tokens

Page 64: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

64

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

SPAN ANNOTATIONS

# make a sentencesentence = Sentence('George Washington was born in Washington .')

# run NER over sentencetagger.predict(sentence)

for entity in sentence.get_spans('ner'):

print(entity)

PER-span [1,2]: "George Washington"

LOC-span [5]: "Washington"

Page 65: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

65

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

EMBED A SENTENCE

from flair.embeddings import WordEmbeddings

# init embeddingglove_embedding = WordEmbeddings('glove')

# create sentence.sentence = Sentence('The grass is green .')

# embed a sentence using glove.glove_embedding.embed(sentence)

Page 66: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

66

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

FLAIR, ELMO AND BERT EMBEDDINGS

# contextual string embeddingsflair_embedding = FlairEmbeddings('news-forward')

# ELMo embeddings (Peters et. al, 2018) elmo_embedding = ELMoEmbeddings('medium')

# Google’s BERT embeddings (Devlin et. al, 2018) bert_embedding = BertEmbeddings('large-uncased')

# stacked embeddingsembedding = StackedEmbeddings([flair_embedding, elmo_embedding, bert_embedding])

Page 67: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

67

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

TRAIN YOUR OWN MODELS

Data fetchers

● Automatically download publicly available NLP datasets

● Data readers for common NLP formats

Model trainer

● Training mechanisms: annealing, checkpointing, restarts, etc.

● Automatic hyperparameter selection

Tutorials online to get you started

Page 68: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

68

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

JOIN THE TEAM!

Help develop it

● Growing numbers of contributors

● New features / bug fixes / languages

● Frequent releases

is on github

Use it

● Install through pip or clone

Page 69: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

69

THANK YOU!

Questions?

(BTW: we’re hiring!)

Page 70: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

70

Text Data MiningResearch

At Zalando

Dr. Alan AkbikZalando Research

Backup Slides

Page 71: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

71

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

ZALANDO AT A GLANCE

~4.4billion EURO

net sales 2017

~214million

visitspermonth

~16,000employees inEurope

>50%return rate across all categories

~24millionactive customers

~300,000product choices

>2,000brands

17countries

Page 72: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

72

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

TRAINING CHARACTER LANGUAGE MODELS

Hidden states, layers

● 1 GPU, 1 week

ELMo model:

● 32 GPUs, 5 weeks

Page 73: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

73

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

QUALITATIVE INSPECTION

Page 74: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

74

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

DIRECT PROJECTION

Page 75: Intro to Flair Open Source NLPalanakbik.github.io/talks/ML_Meetup_2018.pdf · Intro to Flair: Open Source NLP Framework Alan Akbik Zalando Research Please write title, subtitle and

This presentation and its contents are strictly confidential. It may not, in whole or in part, be reproduced, redistributed, published or passed on to any other person by the recipient.

The information in this presentation has not been independently verified. No representation or warranty, express or implied, is made as to the accuracy or completeness of the presentation and the information contained herein and no reliance should be placed on such information. No responsibility is accepted for any liability for any loss howsoever arising, directly or indirectly, from this presentation or its contents.

75

DISCLAIMER