natural language processing (highlights)

40
Natural Language Processing (highlights) Fall 2012 : Chambers

Upload: howell

Post on 24-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Natural Language Processing (highlights). Fall 2012 : Chambers. Early NLP. Dave : Open the pod bay doors, HAL. HAL : I’m sorry Dave. I’m afraid I can’t do that. Commercial NLP. NLP is hard. (news headlines). Minister Accused Of Having 8 Wives In Jail - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Natural Language Processing (highlights)

Natural Language Processing(highlights)

Fall 2012 : Chambers

Page 2: Natural Language Processing (highlights)

Early NLP• Dave: Open the pod bay doors, HAL. • HAL: I’m sorry Dave. I’m afraid I can’t do that.

Page 3: Natural Language Processing (highlights)

Commercial NLP

Page 4: Natural Language Processing (highlights)

NLP is hard. (news headlines)1. Minister Accused Of Having 8 Wives In Jail2. Juvenile Court to Try Shooting Defendant3. Teacher Strikes Idle Kids4. Miners refuse to work after death5. Local High School Dropouts Cut in Half6. Red Tape Holds Up New Bridges7. Clinton Wins on Budget, but More Lies Ahead8. Hospitals Are Sued by 7 Foot Doctors9. Police: Crack Found in Man's Buttocks

Page 5: Natural Language Processing (highlights)

NLP needs to adapt.

Page 6: Natural Language Processing (highlights)

NLP needs to adapt.

http://xkcd.com/1083/

Page 7: Natural Language Processing (highlights)

NLP is also a Knowledge Problem

Page 8: Natural Language Processing (highlights)

Language Models• Language Modeling

• Build probabilities of words and phrases

• Author Detection• Who wrote this email? (is it spam?)• Historical analysis, who was the author of this book?• Intelligence community, who wrote this incendiary blog?

Page 9: Natural Language Processing (highlights)

Language Models: Author IDIt was the year of Our Lord one thousand seven hundred and seventy-five. Spiritual revelations were conceded to England at that favoured period, as at this. Mrs. Southcott had recently attained her five-and-twentieth blessed birthday.

Mr. Bennet was among the earliest of those who waited on Mr. Bingley. Hehad always intended to visit him, though to the last always assuringhis wife that he should not go; and till the evening after the visit waspaid she had no knowledge of it.

Baby, baby, baby ooohLike baby, baby, baby noooLike baby, baby, baby ooohI thought you'd always be mine

- Charles Dickens

- Jane Austen

- Justin Bieber

Page 10: Natural Language Processing (highlights)

Motivation• We want to predict something.• We have some text related to this something.

• something = target label Y• text = text features X

Given X, what is the most probable Y?

Page 11: Natural Language Processing (highlights)

Motivation: Author DetectionAlas the day! take heed of him; he stabbed me inmine own house, and that most beastly: in goodfaith, he cares not what mischief he does. If hisweapon be out: he will foin like any devil; he will

spare neither man, woman, nor child.

X =

Y = { Charles Dickens, William Shakespeare, Herman

Melville, Jane Austin, Homer, Leo Tolstoy }

)|()(maxarg kkyyYXPyYPY

k

Page 12: Natural Language Processing (highlights)

N-gram Terminology• Unigrams: single words• Bigrams: pairs of words• Trigrams: three word phrases• 4-grams, 5-grams, 6-grams, etc.

“I saw a lizard yesterday”Unigrams

Isaw

alizard

yesterday</s>

Bigrams<s> II sawsaw a

a lizardlizard yesterdayyesterday </s>

Trigrams<s> <s> I<s> I saw

I saw asaw a lizard

a lizard yesterdaylizard yesterday </s>

Page 13: Natural Language Processing (highlights)

Sentiment Analysis

Page 14: Natural Language Processing (highlights)

It's about finding out what people think...

Page 15: Natural Language Processing (highlights)

Online social media sentiment apps

● Several Sentiment Sites● Twitter sentiment http://twittersentiment.appspot.com/

● Twends: http://twendz.waggeneredstrom.com/

● Twittratr: http://twitrratr.com/

Page 16: Natural Language Processing (highlights)

Or was she?

Page 17: Natural Language Processing (highlights)

Twitter for Stock Market Prediction

“Hey Jon, Derek in Atlanta is having a bacon and egg, er, sandwich. Is that good for wheat futures?”

Page 18: Natural Language Processing (highlights)
Page 19: Natural Language Processing (highlights)

Sometimes science is hype• The Bollen paper has since been strongly questioned

by others in the field.

• It contained some overuse of statistical significance tests that could have overestimated how well sentiment actually aligned with market movements.

• Nobody has been able to recreate their findings.

Page 20: Natural Language Processing (highlights)

Monitor Real-World Events

Page 21: Natural Language Processing (highlights)

Learn a Lexicon1. Find some data that is labeled

• Movie reviews have star ratings• Manually label data yourself • Use a noisy label, such as “#angry” on tweets

2. Learn a model from the labeled data• Naïve Bayes Classifier• MaxEnt Model (you have not yet learned)• Decision Trees• etc.

Try it now!

Page 22: Natural Language Processing (highlights)

Track Population Moods

Page 23: Natural Language Processing (highlights)

Information Extraction

http://www.youtube.com/watch?v=YLR1byL0U8M

Page 24: Natural Language Processing (highlights)

24

Current Examples• Fact extraction about people.

Instant biographies.• Search “tom hanks” on google

• Never-ending Language Learning• http://rtw.ml.cmu.edu/rtw/

Page 25: Natural Language Processing (highlights)

Where is the Naval Academy?• The United States Naval Academy (also known as USNA,

Annapolis, or Navy) is a four-year coeducational federal service academy located in Annapolis.

• Start your tour at the Armel-Leftwich Visitor Center of the United States Naval Academy, Annapolis, Md.

• this is a great place to walk around, whether you are a 1st time or frequent visitor to annapolis. the academy's campus is situated along the creek, thus offering beautiful views of the water and horizons.

P(annapolis | sentence) = P(annapolis | features/ngrams/etc.)

Page 26: Natural Language Processing (highlights)

Extracting structured knowledge

LLNL EQ Lawrence Livermore National Laboratory LLNL LOC-IN CaliforniaLivermore LOC-IN CaliforniaLLNL IS-A scientific research laboratoryLLNL FOUNDED-BY University of CaliforniaLLNL FOUNDED-IN 1952

Each article can contain hundreds or thousands of items of knowledge...

“The Lawrence Livermore National Laboratory (LLNL) in Livermore, California is a scientific research

laboratory founded by the University of California in 1952.”

Page 27: Natural Language Processing (highlights)

Sentence Parsing

Page 28: Natural Language Processing (highlights)

28

Sentence Parsing• “Fed raises interest rates”

Page 29: Natural Language Processing (highlights)

29

Example 2“I saw the man on the hill with a telescope.”

Page 30: Natural Language Processing (highlights)

30

Words barely affect structure.

telescopes planets

Correct!!! Incorrect

Page 31: Natural Language Processing (highlights)

Machine TranslationStart at ~6min in.http://www.youtube.com/watch?v=Nu-nlQqFCKg

Page 32: Natural Language Processing (highlights)

Machine Translation• Commercial-grade translation

• translate.google.com

Page 33: Natural Language Processing (highlights)

Machine Translation• How to model translations?

• Words: P( casa | house )• Spurious words: P( a | null )• Fertility: Pn( 1 | house )

• English word translates to one Spanish word• Distortion: Pd( 5 | 2 )

• The 2nd English word maps to the 5th Spanish word

Page 34: Natural Language Processing (highlights)

Distortion• Encourage translations to follow the diagonal…

• P( 4 | 4 ) * P( 5 | 5 ) * …

Page 35: Natural Language Processing (highlights)

Learning Translations• Huge corpus of “aligned sentences”.• Europarl

• Corpus of European Parliamant proceedings• The EU is mandated to translate into all 21 official languages• 21 languages, (semi-) aligned to each other

• P( casa | house ) = (count all casa/house pairs!)• Pd( 2 | 5 ) = (count all sentences where 2nd word went

to 5th word)

Page 36: Natural Language Processing (highlights)

Machine Translation Technology• Hand-held devices for military

• Speak english -> recognition -> translation -> generate Urdu

• Translate web documents

• Education technology?• Doesn’t yet receive much of a focus

Page 37: Natural Language Processing (highlights)

Text Influence

Page 38: Natural Language Processing (highlights)

Text Influence• Can text style influence people?• Can a computer learn to adapt language to

accomplish a goal?

• Obama 2012 campaign• Sent emails to people every day asking for donations• Sent variations of email, and learned what features caused

more donations• http://www.businessweek.com/articles/2012-11-29/the-science-behind-those-obama-ca

mpaign-e-mails

Page 39: Natural Language Processing (highlights)

Mobile Devices

Page 40: Natural Language Processing (highlights)

Mobile Devices• Keystroke prediction has been around for a while

now.

• New idea: learn individual user preferences• New idea: use a user’s social media text to train on

• http://www.youtube.com/watch?v=3hQT-o8ch0o• http://www.youtube.com/watch?v=kA5Horw_SOE