text mining lab (summer 2017) - word vector representation

25
Summer 2017 Elvis Saravia PhD, Information Systems and Applications [email protected] Github username: omarsar Questions: sli.do (#Z217)

Upload: elvis-saravia

Post on 21-Jan-2018

37 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Text mining lab (summer 2017) - Word Vector Representation

Summer 2017Elvis Saravia

PhD, Information Systems and [email protected]

Github username: omarsarQuestions: sli.do (#Z217)

Page 2: Text mining lab (summer 2017) - Word Vector Representation

2

Page 3: Text mining lab (summer 2017) - Word Vector Representation

● Knowledge Discovery (KDD) Process

3

Page 4: Text mining lab (summer 2017) - Word Vector Representation

4

Page 5: Text mining lab (summer 2017) - Word Vector Representation

5

Page 6: Text mining lab (summer 2017) - Word Vector Representation

ConceptNet6

Page 7: Text mining lab (summer 2017) - Word Vector Representation

●●●

7

Page 8: Text mining lab (summer 2017) - Word Vector Representation

Motel = [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]Hotel = [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]

●●

One-hot representation

8

Page 9: Text mining lab (summer 2017) - Word Vector Representation

hotel = [0.728 0.234 -0.23 0.223]

Distributed representation (low-dimension vector)9

Page 10: Text mining lab (summer 2017) - Word Vector Representation

10

Paper source: https://arxiv.org/pdf/1301.3781.pdf

Page 11: Text mining lab (summer 2017) - Word Vector Representation

11

Paper source: https://arxiv.org/pdf/1301.3781.pdf

Feedforward Neural Net Language Model (NNLM)

variables to optimizedenotes window range

Page 12: Text mining lab (summer 2017) - Word Vector Representation

12

Page 13: Text mining lab (summer 2017) - Word Vector Representation

13

P(the|over)P(fox|over)P(jumped|over)P(the|over)P(lazy|over)P(dog|over)

P(VOUT | VIN)How to define this prob. distribution?

Determines similarity in [-1,1]

Get a probability in [0,1] out of a similarity in [-1,1]

Page 14: Text mining lab (summer 2017) - Word Vector Representation

14

Page 15: Text mining lab (summer 2017) - Word Vector Representation

15https://www.healthvault.com/en-us/health-bot/

Page 16: Text mining lab (summer 2017) - Word Vector Representation

16

Page 17: Text mining lab (summer 2017) - Word Vector Representation

● https://goo.gl/ppHX65

●○ Gensim guide for word2vec: https://goo.gl/i2UrdH

● https://goo.gl/7b72S9

●● https://goo.gl/uNJDrs

17

Page 18: Text mining lab (summer 2017) - Word Vector Representation

18

Page 19: Text mining lab (summer 2017) - Word Vector Representation

19

Page 20: Text mining lab (summer 2017) - Word Vector Representation

20

Page 21: Text mining lab (summer 2017) - Word Vector Representation

21

Page 22: Text mining lab (summer 2017) - Word Vector Representation

22

Page 23: Text mining lab (summer 2017) - Word Vector Representation

23

Page 24: Text mining lab (summer 2017) - Word Vector Representation

● https://goo.gl/KYacjz

●●●●●

● https://goo.gl/JezgYg

24

Page 25: Text mining lab (summer 2017) - Word Vector Representation

a. Build API: (Flask/Django recommended)b. Pretrained models: (Guide: https://goo.gl/5qt2Ki)c. Visualization: d3js / plotly / tensorboard

a. LSTM - (Guide: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)b. CNN - (Guide: https://goo.gl/PgLUs7)c. RNN - (Guide: https://goo.gl/5L9kci

a. Starting point:https://rare-technologies.com/word2vec-tutorial#app

25