gpu accelerated natural language processing by guillermo molini

38

Upload: big-data-spain

Post on 10-Jan-2017

78 views

Category:

Technology


0 download

TRANSCRIPT

Powered by WAVECRAFTERS

GPU Accelerated Natural Language Processing

Powered by WAVECRAFTERS

RoadmapWhat’s NLP?

Traditional Search.

Modern NLP. Vector Embeddings.

Speech to Text

Demo

Powered by WAVECRAFTERS

Natural Language Processing

Computational techniques used for analysing and representingtext for the purpose of achieving human-like languageprocessing.

Powered by WAVECRAFTERS

Uses• Searching• Information Extraction• Summarization• Question Answering• Customer Interaction• Sentiment Analysis• Speech to Text

Powered by WAVECRAFTERS

RoadmapWhat’s NLP?

Traditional Search.

Modern NLP. Vector Embeddings.

Speech to Text

Demo

Powered by WAVECRAFTERS

How does traditional searching work?• Stemming• Synonyms• Tags• Misspelling support• Ranking

Powered by WAVECRAFTERS

An Example

A seal

Powered by WAVECRAFTERS

Traditional text search

Its just matching!

Powered by WAVECRAFTERS

RoadmapWhat’s NLP?

Traditional Search.

Modern NLP. Vector Embeddings.

Speech to Text

Demo

Powered by WAVECRAFTERS

How do Vector Embeddings work?

0.13 -0.01 0.56 0.32 0.39 -0.79 0.86 0.55 0.22 0.19

Seal

Vector of n dimensions

Powered by WAVECRAFTERS

How do Vector Embeddings work? (II) Training

Powered by WAVECRAFTERS

How do Vector Embeddings work? (III) Training• Different training algorithms: GloVe (Socher, Standford University), Word2Vec (Google), Doc2Vec (Mikolov, Facebook).

• We will be releasing shortly our own GPU based version of GloVe as open-source.

Powered by WAVECRAFTERS

How do Vector Embeddings work? (IV)• Vectors cosines give us the semantic closeness.

Ball

Mars

Ball

Football

But we can also do much more! Adding, subtracting…

Powered by WAVECRAFTERS

How do Vector Embeddings work? (VI)

United Kingdom London MadridSpain

Powered by WAVECRAFTERS

Why aren’t Vector Embeddingswidespread?• Steep Learning curve. Math can be complicated.• Lots of computational power needed. Slow and expensive.

Powered by WAVECRAFTERS

Our solution: GPU Computing

Powered by WAVECRAFTERS

Powered by WAVECRAFTERS

Advantages of GPUs (II)

115ms

11632ms

0

2000

4000

6000

8000

10000

12000

14000

Semantic closeness to 10.000.000 documents. Lower is better!

GPU Execution Time CPU Execution Time

Powered by WAVECRAFTERS

RoadmapWhat’s NLP?

Traditional Search.

Modern NLP. Vector Embeddings.

Speech to Text

Demo

Powered by WAVECRAFTERS

Demo

Searching in a database of web scraped news.

Powered by WAVECRAFTERS

RoadmapWhat’s NLP?

Traditional Search.

Modern NLP. Vector Embeddings.

Speech to Text

Demo

Powered by WAVECRAFTERS

Speech to Text.

Ability to automatically transcribe video / audio into its written form.

Powered by WAVECRAFTERS

Speech to Text. Uses• Information Extraction• Close Captioning• Summarization• Searching

Powered by WAVECRAFTERS

Speech To Text (I). Get the phonemes.

the

co

tea

mema

lo

Powered by WAVECRAFTERS

Speech To Text (II). From Phonemes to Words

A

Ball

Ballet

Bull

Market

Mars

Marsh

0.15

0.12

0.10

0.58

0.24

0.13

0.05

0.03

0.07

0.56

Powered by WAVECRAFTERS

Speech To Text (III). From Phonemes to Words

the

co

tea

mema

lo

Dictionary of probabilities

A ball market is a good chance for investors

Powered by WAVECRAFTERS

Speech to Text (IV)

A ball market is a good chance for investors

Powered by WAVECRAFTERS

Speech to Text (IV)

A bull market is a good chance for investors

Powered by WAVECRAFTERS

Speech to Text (VI). Improving the Error Rate• Do several rounds of processing.• In each one, use NLP to find out the theme of theconversation, then produce a new Language Model (Dictionary)that fits the theme.• Reprocess the input• Costly and slow!

Powered by WAVECRAFTERS

Q&A

Vicente Cuéllar, [email protected]

Guillermo Moliní, [email protected]