cs 124/linguist 180 from languages to information · advisor lab research management finish. ......

58
CS 124/LINGUIST 180 From Languages to Information Dan Jurafsky Stanford University Introduction and Course Overview

Upload: others

Post on 17-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

CS124/LINGUIST180FromLanguagestoInformation

DanJurafsky

StanfordUniversity

IntroductionandCourseOverview

Page 2: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

FromLanguagestoInformation

Automaticallyextractingmeaningandstructurefrom:◦Humanlanguagetextandspeech(news,socialmedia,etc.)◦Socialnetworks◦Genomesequences

Interactingwithhumansvialanguage◦Dialogsystems/Chatbots◦QuestionAnswering◦RecommendationSystems

Page 3: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

CommercialWorld

Page 4: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

1.Extractinginformationfromlanguage

Page 5: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

InformationRetrieval

6,586,013,574 websearcheseveryday(byoneestimate)Text-basedinformationretrievalisthuslikelythemostfrequentlyusedpieceofsoftwareintheworldHowdoesitwork?CanyoubuildanIRengine?ProgrammingAssignment4:Search!

Page 6: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

ExtractingSentimentandSocialMeaning

Lotsofmeaningisinconnotation"connotation: an idea or feeling that a word invokes in addition to its literal or primary meaning."

Extractingconnotationisgenerallycalledsentimentanalysis

Page 7: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

SentimentAnalysis

Page 8: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

ExtractingSocialMeaningfromSpeech

Uncertainty(studentsintutoring)Annoyance◦callerstodialogsystems:DeceptionEmotionIntoxicationFlirtation,Romanticinterest◦ McFarland,Jurafsky,Ranganath

Page 9: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Whatdoyoudoforfun?Dance?Uh,dance,uh,Iliketogo,likecamping.Uh,snowboarding,butI'mnotgood,butI

liketogoanyway.Youlikeboarding.Yeah.Iliketodoanything.LikeI,I'mupforanything.Really?Yeah.Areyouopen-mindedaboutmosteverything?Noteverything,butalotofstuff-Whatisnoteverything[laugh]Idon'tknow.Thinkofsomething,andI'llsayifIdoitornot.[laugh]Okay.[unintelligible].Skydiving.Iwouldn'tdoskydivingIdon'tthink.YeahI'mafraidofheights.F:Yeah,yeah,metoo.M:[laugh]Areyouafraidofheights?F:[laugh]Yeah[laugh]

Page 10: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Whatdoflirtersdo?Womenwhenflirting:◦raisepitchceiling◦laughatthemselves◦say“I”Menwhenflirting:◦raisetheirpitchfloor◦laughattheirdate(teasing?)◦say“you”and“youknow”◦don’tusewordsrelatedtoacademics

Rajesh Ranganath, Dan Jurafsky, and Daniel A. McFarland. 2013. Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates. Computer Speech and Language. 27:1, 89-115

Page 11: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Unlikelywordsformaleflirting

academiainterviewteacherphdadvisorlabresearchmanagementfinish

Page 12: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

SentimentinRestaurantReviewsDanJurafsky,VictorChahuneau,BryanR.Routledge,andNoahA.Smith.2014.Narrativeframingofconsumersentimentinonlinerestaurantreviews.FirstMonday19:4

Thebartender...absolutelyhorrible...wewaited10minbeforeweevengotherattention...andthenwehadtowait45- FORTYFIVE!- minutesforourentrees…stalkthewaitresstogetthecheque…shedidn'tmakeeyecontactorevenbreakherstridetowaitforaresponse…

900,000Yelpreviewsonline

A very bad (one-star) review:

Page 13: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Whatisthelanguageofbadreviews?Negativesentimentlanguagehorribleawfulterriblebaddisgusting

Pastnarrativesaboutpeoplewaited,didn’t,washe,she,his,her,manager,customer,waitress,waiter

Frequentmentionsofweand us...we wereignoreduntilwe flaggeddownawaitertogetour waitress…

Page 14: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

OthernarrativeswiththislanguageAgenreusing:Pasttense,we/us,negative,peoplenarratives

Textswrittenbypeoplesufferingtrauma◦ JamesPennebaker lab◦ Pasttenseasdistancing◦ Useof“we”:seekingsolaceincommunity

1-starreviewsaretraumanarratives!Thelessonofreviews:It’sallaboutpersonalinteraction

Page 15: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Whataboutpositivereviews?Sex,Drugs,andDessert

• orgasmicpastry• sexyfood• seductivelysearedfois gras

� addictedtopeppershooters� garlicnoodles…mydrugofchoice� thefriesarelikecrack

Page 16: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

ComputationalBiology:ComparingSequences

SLIDE STUFF FROM SERAFIMBATZOGLOU

AGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGAC

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---| | | | | | | | | | | | | x | | | | | | | | | | |

TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACSequencecomparisoniskeyto• Findinggenes• Determiningfunction• UncoveringevolutionaryprocessesThisisalsohowspellcheckerswork!

We'lllearn:editdistancealgorithms(Quiz1)

Page 17: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

SocialNetworks

Thenetworkformedbyyourfriendsorotherrelationsofflineoronline◦Canwecomputepropertiesofthesenetworks?◦Extractinformationfromthem?

Page 18: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Highschooldating

PeterS.Bearman,JamesMoodyandKatherineStovel Chainsofaffection:ThestructureofadolescentromanticandsexualnetworksAmericanJournalofSociology 11044-91(2004)ImagedrawnbyMarkNewman

Whatisthestructureofsocialrelations?Imagineagraphofhighschool

• peoplearenodes• linksareromanticrelationships

Whatwilltheshapeofthisgraphbe?Adenselyconnectedgraph?Aline?Acycle?

Page 19: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why
Page 20: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

2.Interactingwithhumansvialanguage

Page 21: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

QuestionAnswering:IBM’sWatson

Page 22: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

RecommendationEngines

Page 23: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

PersonalAssistants

Page 24: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Whyislanguageinterpretationhard?

Page 25: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Ambiguity

Resolvingambiguityishard

Page 26: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

AmbiguityFindatleast6meaningsofthissentence:

I made her duck

Page 27: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

AmbiguityFindatleast6meaningsofthissentence:

I made her duckIcookedwaterfowlforherbenefit(toeat)IcookedwaterfowlbelongingtoherIcreatedthe(plaster?)waterfowlsheownsIcausedhertoquicklylowerherheadorbodyIrecognizedthetrueidentityofherspywaterfowlIwavedmymagicwandandturnedherintoundifferentiatedwaterfowl

Page 28: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

AmbiguityisPervasive

IcausedhertoquicklylowerherheadorbodyPartofspeech:“duck”canbeaNounorVerb

Icookedwaterfowlbelongingtoher.Partofspeech:“her”ispossive pronoun(“ofher”)“her”isdativepronoun(“forher”)

Imadethe(plaster)duckstatuesheownsWordMeaning:“make”canmean“create”or“cook”

Page 29: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

AmbiguityisPervasive

Grammar:make canbe:Transitive:(verbhasanoundirectobject)

Icooked[waterfowlbelongingtoher]Ditransitive:(verbhas2nounobjects)

Imade[her](into)[undifferentiatedwaterfowl]Action-transitive(verbhasadirectobject+verb)Icaused[her][tomoveherbody]

Page 30: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

AmbiguityisPervasive:Phonetics!!!!!ImateorduckI’meightorduckEyemaid;herduckAyemate,herduckImaidherduckI’maidherduckImateherduckI’mateherduckI’mateorduckImateorduck

Page 31: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Moredifficulties:Non-standardlanguage

Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever& you yourself should never give up either♥

Andneologisms:unfriendretweetbromance

Page 32: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Makingprogressonthisproblem…

Thetaskisdifficult!Whattoolsdoweneed?◦Knowledgeaboutlanguageandtheworld◦Awaytocombineknowledgesources

Howwegenerallydothis:◦probabilisticmodelsbuiltfromlanguagedataP(“maison”® “house”)highP(“L’avocat général”® “thegeneralavocado”)low

Page 33: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

ModelsandToolsRegularExpressionsEditdistanceandalignmentWordembeddings◦vector/neuralmodelsofmeaning

Languagemodels(wordprediction)MachineLearningclassifiers◦NaïveBayes◦LogisticRegression 33

Networkalgorithms◦PageRank

Recommendationalgorithms◦Collaborativefiltering

Linguistictools◦Sentimentlexicons

Page 34: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

CourselogisticsinbriefInstructor:DanJurafskyTAs:WillHamilton(headTA)

Time:TuTh 3:00-4:20,420-040cs124.stanford.edu

JeffPykeKellyShenStephanieTangLucyWangRobVoigt

RobinJiaRafaelMusaKateParkCharissa Plattner

JanetteChengTimDozatAshkon FarhangiGasparGarcia

Page 35: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

EvidenceBasedPedagogy!

35

Page 36: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

http://www.knewton.com/flipped-classroom/

Page 37: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Whytheflippedclassroom(1)Masterylearning:LearnuntilyoumasterBenjaminBloom,1968

Page 38: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Bloom'smasterylearningPersonalized,goal-drivenpractice,drivenbyfeedback1. Watch(andre-watch)lecturesatyourownpaceand

learnwhenit'sbestforyou2. Videoshaveembeddedminiquizzes.Ifyougetitwrong,

itgivesyoufeedbackaboutwhyyoumisunderstood.3. Youhave2chancesateachweeklyTuesdayQuizzes,so

youcangobacktothelectureandretakethem.4. Withprogrammingassignmentsyoucanseeyour

performanceonthetrainingsettoseewhatyou'redoingwrong!

Page 39: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Whythevideoshaveembeddedquizzes:“summative”vs“formative”assessment

Summativeassessment◦Finalexams:goalisgrading

Formativeassessment◦Alongtheway:goalisforyou tofindoutwhatyoudon’tknowsoyoucanlearn

Page 40: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Whytheflippedclassroom(2)

Attentionspan:everyonespacesoutduringlonglectures◦Middendorf andKalish,1995,Johnstone andPercival1976,Burns1985

“theclassstarted1:00.Thestudentsittinginfrontofmetookcopiousnotesuntil1:20.Thenhejustnoddedoff…motionless,witheyesshutforaboutaminuteandahalf,penstillpoised.Thenheawokeandcontinuedhisrapidnote-takingasifhehadn’tmissedabeat.”Studentrememberedonlythefirst15-20minutes

Page 41: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Whytheflippedclassroom(3)Activelearning:Beinchargeofyourlearning◦Obviouslymostimportant:programmingassignments◦Activelearning(“constructivism”),learningbydoing

Collaborativelearning:Learnfromeachother◦Useclasstimeforgroupactivities,workedproblems◦“Smallgroupactivelearning”

Page 42: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

cs124:Semi-flippedclassroomLecturesonvideo:Iexpectyouto:◦Watchvideolectures(and/orandreadtextbookchapters)◦Onaverageabout90minutesofvideocontenteachweek◦ Somepeoplewatchitspeededup

Somelectureslive:◦ 8lecturesand1groupsessionarerequired(onfinalexam,novideos)◦ Iwillalsore-lecture(double-cover)afewofthevideos◦ somepeopleliketheengagementofin-classlectures

In-classgroupsessions(“activelearning”)◦ Optionalbutrecommended

Page 43: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

LogisticsMoreSpecificallyOnlineVideoLectureswithembeddedquizzes(beforeclass)WeeklyonlineReviewQuizzes(Tueoffollowingweek)RoughlyweeklyPythonhomeworks (Frioffollowingweek)FinalExam(TuesdayMarch203:30-6:30)Classsessions:Allencouraged;8 livelecturesrequired◦Fulllectures◦Mini-lectures◦Groupworkedproblems

Page 44: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

TheOpenPlatform:EdX!

https://lagunita.stanford.edu/abouthttp://edx.readthedocs.io/projects/edx-guide-for-students/en/latest/index.htmlhttps://open.edx.org/about-open-edx

Page 45: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

LearningGoals

Attheendofthiscourse,youwillbeableto:

45

Page 46: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Learninggoals

Writeefficientregularexpressionstosolveanykindoftext-basedextractiontask

46

Page 47: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Learninggoals

Applytheeditdistancealgorithmtoallsortsoftextsequenceproblems

47

Page 48: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Learninggoals

Buildasupervisedclassifiertosolveproblemslikesentimentclassification

48

Page 49: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Learninggoals

Buildasearchengine

49

Page 50: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Learninggoals

Buildarecommendationengine

50

Page 51: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Learninggoals

Buildacomputationalmodelofwordmeaning(usinglexiconsandembeddings)

51

Page 52: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Learninggoals

Buildachatbot

52

Page 53: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Learninggoals

UnderstandandimplementPageRank

53

Page 54: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Thisclassistheundergradintroto:

Win2018:cs224NNaturalLanguageProcessingw/DeepLearningWin2018:cs246MiningMassiveDataSetsSpr 2018:cs222UNaturalLanguageUnderstandingAut 2018:cs224WAnalysisofNetworks

Spr 2019:cs276InformationRetrievalandWebSearch

TBD:cs224SSpokenLanguageProcessing

54

Page 55: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Syllabus

http://web.stanford.edu/class/cs124

Page 56: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

Comingupnextclass(Thursday)

Unixforpoetsgrepsort

Page 57: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

PA1:SpamLord!

Writeregularexpressionstospreadevilthroughoutthegalaxy!Byextractingemailaddressesandphonenumbersfromtheweb!jur a fs ky at st anford dot e d u

GoesliveFriday!

57

Page 58: CS 124/LINGUIST 180 From Languages to Information · advisor lab research management finish. ... Question Answering: IBM’s Watson. Recommendation Engines. Personal Assistants. Why

ActionItemsBeforeThursday!1)Readthesyllabuswebpageatcs124.stanford.edu

2)SignupforpiazzaandedX◦ ForedX,you'llneedtofirstsigninwithyourSUnet IDat suclass.stanford.edu,andthenclickontheEdX buttonatthetopofthecs124.stanford.eduwebpage

3)Watchthefirsthalfofthisweek’svideos(“BasicTextProcessing”)beforeclass!

4)Downloadthisfiletoyourlaptop

http://cs124.stanford.edu/nyt_200811.txt.gz58