introduction to artificial intelligence corenlp, semantic ... · introduction to arti cial...

34
Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classifier Janyl Jumadinova November 18, 2016

Upload: others

Post on 19-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Introduction to Artificial IntelligenceCoreNLP, Semantic Analysis, Naives

Bayes Classifier

Janyl JumadinovaNovember 18, 2016

Page 2: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP

I Reference: http://stanfordnlp.github.io/CoreNLP/

I Package available in /opt/corenlp/

I Run: java -cp

"/opt/corenlp/stanford-corenlp-3.7.0/*" -Xmx2g

edu.stanford.nlp.pipeline.StanfordCoreNLP

-annotators tokenize,ssplit,pos,lemma,ner -file

input.txt

2/24

Page 3: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I tokenize: Creates tokens from the given text.

I ssplit: Separates a sequence of tokens into sentences.

I pos: Creates Parts of Speech (POS) tags for tokens.

I ner: Performs Named Entity Recognition classification.

3/24

Page 4: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I tokenize: Creates tokens from the given text.

I ssplit: Separates a sequence of tokens into sentences.

I pos: Creates Parts of Speech (POS) tags for tokens.

I ner: Performs Named Entity Recognition classification.

3/24

Page 5: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I tokenize: Creates tokens from the given text.

I ssplit: Separates a sequence of tokens into sentences.

I pos: Creates Parts of Speech (POS) tags for tokens.

I ner: Performs Named Entity Recognition classification.

3/24

Page 6: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I lemma: Creates word lemmas for tokens.

– The goal of lemmatization (as of stemming) is to reducerelated forms of a word to a common base form.– Lemmatization usually uses a vocabulary and morphologicalanalysis of words to:- remove inflectional endings only, and- to return the base or dictionary form of a word, which isknown as the lemma.

4/24

Page 7: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I lemma: Creates word lemmas for tokens.– The goal of lemmatization (as of stemming) is to reducerelated forms of a word to a common base form.

– Lemmatization usually uses a vocabulary and morphologicalanalysis of words to:- remove inflectional endings only, and- to return the base or dictionary form of a word, which isknown as the lemma.

4/24

Page 8: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I lemma: Creates word lemmas for tokens.– The goal of lemmatization (as of stemming) is to reducerelated forms of a word to a common base form.– Lemmatization usually uses a vocabulary and morphologicalanalysis of words to:- remove inflectional endings only, and- to return the base or dictionary form of a word, which isknown as the lemma.

4/24

Page 9: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment Analysis

5/24

Page 10: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment Analysis

I https://www.csc.ncsu.edu/faculty/healey/tweet_viz/

tweet_app/

I http://www.alchemyapi.com/developers/

getting-started-guide/twitter-sentiment-analysis

I www.sentiment140.com 6/24

Page 11: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment analysis has many other names

I Opinion extraction

I Opinion mining

I Sentiment mining

I Subjectivity analysis

7/24

Page 12: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment analysis is the detection of attitudes

I “enduring, affectively colored beliefs, dispositions towardsobjects or persons”

8/24

Page 13: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Attitudes

I Holder (source) of attitude

I Target (aspect) of attitude

I Type of attitude- From a set of types:Like, love, hate, value, desire, etc.- Or (more commonly) simple weighted polarity:positive, negative, neutral, together with strength

I Text containing the attitude- Sentence or entire document

9/24

Page 14: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Attitudes

I Holder (source) of attitude

I Target (aspect) of attitude

I Type of attitude- From a set of types:Like, love, hate, value, desire, etc.- Or (more commonly) simple weighted polarity:positive, negative, neutral, together with strength

I Text containing the attitude- Sentence or entire document

9/24

Page 15: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Attitudes

I Holder (source) of attitude

I Target (aspect) of attitude

I Type of attitude- From a set of types:Like, love, hate, value, desire, etc.- Or (more commonly) simple weighted polarity:positive, negative, neutral, together with strength

I Text containing the attitude- Sentence or entire document

9/24

Page 16: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment analysis

I Simplest task:Is the attitude of this text positive or negative?

I More complex:Rank the attitude of this text from 1 to 5

I Advanced:Detect the target, source, or complex attitude types

10/24

Page 17: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment analysis

I Simplest task:Is the attitude of this text positive or negative?

I More complex:Rank the attitude of this text from 1 to 5

I Advanced:Detect the target, source, or complex attitude types

10/24

Page 18: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment analysis

I Simplest task:Is the attitude of this text positive or negative?

I More complex:Rank the attitude of this text from 1 to 5

I Advanced:Detect the target, source, or complex attitude types

10/24

Page 19: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Baseline Algorithm

I Tokenization

I Feature Extraction

I Classification using different classifiers– Naive Bayes– MaxEnt– SVM

11/24

Page 20: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment Tokenization Issues

I Deal with HTML and XML markup

I Twitter/Facebook/... mark-up (names, hash tags)

I Capitalization (preserve for words in all caps)

I Phone numbers, dates

I Emoticons

12/24

Page 21: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Extracting Features for Sentiment

Classification

I How to handle negation:I didn’t like this movie vs. I really like this movie

I Which words to use?–Only adjectives–All words

13/24

Page 22: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Extracting Features for Sentiment

Classification

I How to handle negation:I didn’t like this movie vs. I really like this movie

I Which words to use?–Only adjectives–All words

13/24

Page 23: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Negation

Add NOT to every word between negation and followingpunctuation

14/24

Page 24: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

I Simple (“naive”) classification method based on Bayes rule

I Relies on very simple representation of document:- Bag of words

15/24

Page 25: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

16/24

Page 26: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

17/24

Page 27: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

18/24

Page 28: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

For a document d and a class c

19/24

Page 29: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

20/24

Page 30: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

21/24

Page 31: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

22/24

Page 32: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Binarized (Boolean feature) Multinomial

Naive Bayes

Intuition:

I Word occurrence may matter more than word frequency

I The occurrence of the word fantastic tells us a lot

I The fact that it occurs 5 times may not tell us much more.

Boolean Multinomial Naive Bayes

Clips all the word counts in each document at 1

23/24

Page 33: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Binarized (Boolean feature) Multinomial

Naive Bayes

Intuition:

I Word occurrence may matter more than word frequency

I The occurrence of the word fantastic tells us a lot

I The fact that it occurs 5 times may not tell us much more.

Boolean Multinomial Naive Bayes

Clips all the word counts in each document at 1

23/24

Page 34: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Neural Networks and Deep Learning: Next!

I http://nlp.stanford.edu/sentiment/

I java -cp "/opt/corenlp/stanford-corenlp-3.7.0/*"

-Xmx2g edu.stanford.nlp.sentiment.SentimentPipeline

-file input.txt

24/24