leveraging procedural knowledge for task-oriented search...grow taller exercises … 18 bq iq bq iq...
TRANSCRIPT
![Page 1: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/1.jpg)
LeveragingProceduralKnowledgeforTask-OrientedSearch
ZiYang,EricNyberg
LanguageTechnologiesInstituteSchoolofComputerScienceCarnegieMellonUniversity
{ziy,ehn}@cs.cmu.edu
![Page 2: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/2.jpg)
Outline
• Background• ProblemDefinition• ProposedApproach• Experiment• Conclusion
2
![Page 3: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/3.jpg)
• Decomposethetaskintorequiredsubtasksmanually• Formulatequeriesmanually
• Entity-centricsearch– Seekforattribute,feature,relatedentity,action,etc.
• Task-orientedsearch– Solutionseekinganddecisionsupport.
Entity-centricvs.Task-orientedSearch
organizeaconference
chooseahotel
comparebanquetoption
recruitvolunteers
contactthe publisher considerthenumber andsize ofconference rooms
arrangemealcatering andmenu plan
checkfordiscounted rate
Howdosearchersaccomplishtasksusinginteractivesearch?
3
![Page 4: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/4.jpg)
HowdoSearchEnginesAssistSearchers?
• QuerysuggestionasanexampleEntity-centricsearch
Suggestattribute,feature,relatedentity,action,etc.
KnowledgeofattributeandfeaturesDescriptiveknowledge
Descriptiveknowledgebase
Task-orientedsearch
Suggestrequiredsubtasks,actions,solutions,etc.
Knowledgeexercisedintheaccomplishmentofatask,i.e.howtodothingsProceduralknowledge
ExistingsolutionsProblemstudiedinthiswork
Proceduralknowledgebase4
![Page 5: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/5.jpg)
ThinkReversely!
• Canwelearnproceduralknowledgefromusers’searchactivitiesand/orquerysuggestions,andbuildaPKBautomatically?
Task-orientedsearch
Suggestrequiredsubtasks,actions,solutions,etc.
Knowledgeofexercisedintheaccomplishmentofatask,i.e.howtodothingsProceduralknowledge
Problemalsostudiedinthiswork
AutomaticallybuiltPKBProceduralknowledgebase
5
![Page 6: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/6.jpg)
RelatedWork
• Searchintent&task-orientedsearch– Complexsearchtaskassistantfromquerylog[Hassanetal.2012,2014]
– Task-orientedquestionsandhow-toWebqueries[Weber2012]
– IMine,SubtaskMining@NCTIR[Liu2014]• Proceduralknowledgeacquisition– Ontologiesproposedforstructuredrepresentationofproceduralknowledge[Fukazawa2010,Pareti 2014]
– Extractionbasedonstructuralinformation[Jung2010],definitionofrulesortemplates[Addis2009]
– Terminology:goal vs.target vs.purpose, instruction vs.actionsequence,step vs.action,etc.
6
![Page 7: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/7.jpg)
Outline
• Background• ProblemDefinition– Terminology– Problem1:SearchTaskSuggestion(STS)– Problem2:AutomaticProceduralKnowledgeBaseConstruction(APKBC)
– STSandAPKBC
• ProposedApproach• Experiment• Conclusion
7
![Page 8: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/8.jpg)
Proceduralknowledgegraph/base(PKB)
Terminology
How to Clean a Birdbath
How to Fix a Leaky Faucet
Ashortandconcisesummary
Adetailedexplanation
Atask
Is-achieved-byrelationbetweenaparenttaskand
alistofsubtasks• Numbered“Steps”• Bulletedsubsteps• Outgoing freelinks
8
![Page 9: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/9.jpg)
Problem1:SearchTaskSuggestion(STS)• Whenusersturntosearchenginesforinformationseeking
andproblemsolving,howtoleverageexistingproceduralknowledgetosuggestsubsearchtask(i.e.query)?
SearchTaskSuggestion:GivenaproceduralknowledgegraphGandatask-orientedsearchq,weaimto
Task-orientedsearch Proceduralknowledgebase
searchtaskq taskt
1(a)identify thetaskfromT theuserintendstoaccomplish
taskss1,…,sn
1(b) retrievealistofn sub tasks
searchtasksp1,…,pk 1(c)suggestthe
corresponding subsearchtask9
![Page 10: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/10.jpg)
AutomaticProceduralKnowledgeBaseConstruction:Givenataskt,weaimto
Task-orientedsearch Proceduralknowledgebase
Problem2:AutomaticProceduralKnowledgeBaseConstruction(APKBC)
tasktsearchtaskq2(a)identifyasearchtask
taskss1,…,sn
2(c)identifyn (≤k)searchtaskstogeneraten tasksthatcanbeperformed toaccomplishthetaskt withtextdescription.
searchtasksp1,…,pk
2(b)collectkrelatedsearchtasks
• Usersstillfaceadhoc situations(tasks)thatarenotcoveredbyanexistingPKB,butothersearchersmayhaveinteractedwithsearchenginestoattemptasolution.
• CanweconstructaPKBusingsearchqueriesandrelevantdocumentsreturnedfromsearchengines?
10
![Page 11: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/11.jpg)
Outline
• Background• ProblemDefinition• ProposedApproach– BasicIdea– Three-wayParallelCorpusConstruction– FeatureDefinitionandModelConstruction
• Experiment• Conclusion
11
![Page 12: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/12.jpg)
Queryable Phrase/TaskDescriptionExtraction:BasicIdea
• Jointlearningfromavailableartifacts
ExistingPKBs• Can indicatehowto
accomplishtasks• Arenot optimizedfor
interactivesearch
Existingsearchlog• Can reveal howto
formulatequeries• Cannot coverhowto
searchforproceduralknowledge
ExistingWebdocuments• Can exemplifyhowto
describetasks• Donot focuson
procedure
Canwetaketheadvantageofalltheartifactsandlearnfromeachother?
Queryphraseextraction
Three-wayparallelcorpusconstruction
Taskdescriptionextraction12
![Page 13: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/13.jpg)
Three-wayParallelCorpusConstruction
• Parallelcorpus:=asetofmatchingtriples
• Example:GrowTallerhttp://www.wikihow.com/Grow-Taller
⟨ aqueryq,ataskt,atextualcontextc⟩
13
![Page 14: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/14.jpg)
Three-wayParallelCorpusConstruction(cont’d)
• Step1:Extractingseedtriplesfromsearchquerylog– Scanthroughtheentiresearchquerylogtofindeachqueryq
thatmatchesthedescriptionoftaskt.– Extractthetextualcontentfromthetoprelevantdocumentsto
retrievethecontextc.Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot
growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…
• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.
• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…
ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily
andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.
• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.
Searchqueriesinasession
growtaller
14
Exactmatchingisusedintheexperiment.
![Page 15: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/15.jpg)
Three-wayParallelCorpusConstruction(cont’d)
• Step2(optional):ManuallycreatingsearchtasksfortasksinthePKB– Usethesummaryofthetaskt toformasearchqueryq and
issueitthesearchenginetoextractcontextc.– Excludethistripledueto“artificiality”!
Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot
growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…
• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.
• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…
ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily
andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.
• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.
Searchqueriesinasession
growtaller
15
![Page 16: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/16.jpg)
Three-wayParallelCorpusConstruction(cont’d)
• Step3:Collectingrelatedqueries– Combinetheuser-issuedqueriesfromthesamesession(from
Step1)andthelistofqueriessuggestedbythesearchengine(fromSteps1and2).
Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot
growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…
• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.
• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…
ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily
andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.
• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.
Searchqueriesinasession
growtaller
humangrowthhormone
growtallerexercises
…
16
![Page 17: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/17.jpg)
Three-wayParallelCorpusConstruction(cont’d)
• Step4:Expandingparallelcorpus– Foreachrelatedqueryp,findthesubtasks1,…,sn thatcontains
p initssummaryorexplanation,andretrieveitscontextd.– Discardunmatchedrelatedqueries ortaskdescriptions.
Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot
growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…
• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.
• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…
ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily
andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.
• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.
Searchqueriesinasession
growtaller
humangrowthhormone
growtallerexercises
…
17
Exactmatchingisusedintheexperiment.
![Page 18: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/18.jpg)
Three-wayParallelCorpusConstruction(cont’d)
• Step5:AnnotatingBIO– Findthecontiguoussequenceofwordsfromthetaskt (context
c)thatismostrelevanttothequeryq (taskt’ssummaryorexplanation).
Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot
growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…
• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.
• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…
ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily
andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.
• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.
Searchqueriesinasession
growtaller
humangrowthhormone
growtallerexercises
…
18
BQ IQ
BQ IQ IQ
BTE ITE …Exactmatchingisusedforannotatingtask intheexperiment.
Selectedthesentencesfromcontext thatcontainallthetokens inthetask summaryand70%+ofthetokens inthetask explanation,andannotatedtheminimalspanthatcontainsthoseoverlappingtokens.
![Page 19: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/19.jpg)
FeatureDefinition
• Featurelistforbothcontext andtask
19
Category Description/Motivation CountLocation(LOC): Appearsinthetask summaryandexplanation 2
“Skimmable information thatreaderscanquicklyunderstand”shouldbeprovidedinthetitleandthebeginningsentenceofeachstep.
Part ofspeech(POS) 36
Boththearticletitleandthefirstsentenceineachstepbeginwithaverbinbareinfinitiveform.
Parse(PAR)
Basic Stanforddependency types 50
Namedentity,nounphrase,verbphrase 3
Identify thetaskfacets(subsidiary resourcesorconstraints,etc.)
Word,context
Surface, stem,TF-IDFscore 3
Surface,stem,TF-IDFscore,POStagsofprevious/nextword 78
![Page 20: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/20.jpg)
ModelConstruction
• Wordsequencelabelingforquery construction,tasksummaryandexplanationconstruction
20
Query construction Tasksummaryconstruction
Taskexplanationconstruction
Problem Wordsequence labelingproblems
Model MQ MTS MTE
Features The samefeatureset,exceptthatlocationisonlyusedforquery
Training set Features X t, labelsY t extractedfromtaskdescription
Features X c,labels Y c extractedfromcontext
Predictionobjective
yt*=argmax p (y t |x t ;M Q)y t ∈{BQ, IQ,O}|t |
yc*=argmax p(y c |x c ;MTS)y t ∈{BTS, ITS,O}|c |
yc*=argmax p(y c |x c ;MTE)y t ∈{BTE, ITE,O}|c |
Output yt *=O…OBQIQIQO…O yc *=O…OBTSITSITSO…OBTEITEITEO…O
![Page 21: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/21.jpg)
Task-orientedsearch Proceduralknowledgebase
STSandAPKBC
tasktsearchtaskq2(a)identify searchtask
taskss1,…,sn
2(c)identifyandgeneratesubtaskssearchtasks
p1,…,pk
2(b)collectrelatedsearchtasks
1(a)identify task
1(c)suggestandcreatesubsearchtask
1(b) retrievesubtasks
Exactmatchingorretrievalbasedmethod
Needasearch intentmodeltoretrievetask-orientedsearchtasks(futurework)
RefertoPKBtoretrieverelatedsubtasks
Generatequeryable phrases/taskdescriptionsusinganalgorithmthatlearnshowsearchersformulatequeries/editorsdescribeproceduralknowledge
21
![Page 22: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/22.jpg)
Outline
• Background• ProblemDefinition• ProposedApproach• Experiment– DataPreparation– ExperimentSettings– SearchTaskSuggestionResult– ProceduralKnowledgeBaseConstructionResult
• Conclusion
22
![Page 23: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/23.jpg)
DataPreparation
• EnglishwikiHowdatadump• AOLsearchquerylog• Queriessuggestedbysearchengines• Contextextractedfromsearchengines
23
![Page 24: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/24.jpg)
ExperimentSettings
• Sequencelabeling vs.end-to-end evaluation
Sequencelabelingevaluation End-to-end evaluation
Goldstandard
Automaticallylabeledparallelcorpus
Manualjudgment
Testset 10-foldcrossvalidation 50randomlysampledtriples
Evaluationmethods
Precision,Recall,F-1,averagedonalltestinstances(macro-averaged) andoneachtaskthenacrossalltasks(micro-averaged),F-1basedROUGE-2and-S4
Macro-averagedandmicro-averagedPrecision@8, MAP
Baselinemethods
CRF(proposed), HMM(surface),LR,SVM,featureablation
Google, Bing,wikiHow
Featureextractors,learners
StanfordCoreNLP:sentence,token, stem,POS,dependencyparse,chunk,namedentityMALLET:CRF,HMM;LibLinear:LR,SVM
24
![Page 25: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/25.jpg)
SearchTaskSuggestionResult
• Queryconstructionresult– TheproposedCRF-basedapproachoutperformsother
classifiers*,esp.independentclassifiers(max.SVM).– Alsooutperformseachfeaturecategory**(max.W/WORD),
andLOUstudyns (max.W/OPOS).
.7471 .6930.8112 .8087
.6855 .6612.7922 .7892
.6803 .6175.7713 .7657.7466 .6870.8113 .8082
.0000
.2000
.4000
.6000
.8000
MacroF1 MicroF1 ROUGE-2 ROUGE-S4
CRF HMM SVM LR TFIDF
W/POS W/PAR W/LOC W/WORD W/OPOS
W/OPAR W/OLOC W/OWORD LOCAL CONTEXT25
![Page 26: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/26.jpg)
SearchTaskSuggestionResult(cont’d)
PROPOSED GOOGLE BING
Task:slimdown
weightloss slimdowndiet the slimdownclub
heavyfood 7dayslimdown howtoslimdownfast
junkfood weightloss slimdownchallenge
keepupthemood slimdownthighs howtoslimdownlegs
Task:playredalert2
buildabarracks redalert 2complete(iso)original2disc
playredalert 2game
buildawarfactory playredalert2free playra2online
radarchould playredalert2onlinefree redalert2download
buildapowerplant/tesla reactor playredalert3 freeredalert3
• End-to-endexample– Slimdown– Playredalert2
26
![Page 27: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/27.jpg)
SearchTaskSuggestionResult(cont’d)
• End-to-endevaluation– Proposedapproachistailoredfortask-orientedsearch.– Currentgeneral-purposecommercialsearchenginesare
designedforentity-centricsearch– Currentsearchenginestendtosuggestqueriesbyappending
keywordssuchasproduct,image,logo,online,free,etc..4457 .4457
.3361
.0972 .0973.0553.0333 .0313 .0120
.0676 .0612 .0549
.0000
.1000
.2000
.3000
.4000
.5000
MacroP MicroP MAP
PROPOSED GOOGLE BING LOG
27
![Page 28: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/28.jpg)
AutomaticProceduralKnowledgeBaseConstructionResult
.4207.3455
.4463 .4392
.1175 .1119
.2425 .2301
.3556.3153
.3822 .3788.4129
.3198.4170 .4118
.0000
.1000
.2000
.3000
.4000
MacroF1 MicroF1 ROUGE-2 ROUGE-S4
CRF HMM SVM LR TFIDF
W/POS W/PAR W/WORD W/OPOS W/OPAR
W/OWORD LOCAL CONTEXT
• Tasksummarygenerationresult– Allscoresarelowerthaninthequeryconstructiontask.– CRF outperformsotherclassifiers*(max.SVM),eachfeature
categoryns (max.W/POS),andLOUstudyns (max.W/OWORD).
28
![Page 29: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/29.jpg)
AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)
• Taskexplanationgenerationresult– CRF outperformsotherclassifiers*(max.HMM,implyingthe
importanceofsurfaceformsandsequencelabelingnature).– Alsooutperformseachfeaturecategoryns (max.W/WORD).– LOUstudyshowsW/OPAR performsthebestintermsof
ROUGE..3853 .3577 .3698 .3686
.0000 .0050
.2450 .2324
.3639.3176 .3489 .3472
.3718.3468
.3804 .3793
.0000
.1000
.2000
.3000
.4000
MacroF1 MicroF1 ROUGE-2 ROUGE-S4
CRF HMM SVM LR TFIDF
W/POS W/PAR W/WORD W/OPOS W/OPAR
W/OWORD LOCAL CONTEXT29
![Page 30: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/30.jpg)
• End-to-endexample– Searchenginewouldsuggest“signupforairbnb coupon”for
“signupforairbnb”,whichimpliesanimportantresourceforthetask.
Task:signupforairbnb
Airbnb isnolongerrunningthe$50 OFF$200promobutyoucanstillsave$25OFFYourFirstAirbnb Stayof$75ormorebycopyingandpastingthislink intoyourbrowser…
Task:makeblueberrybananabread
Pleasedon’tuse regularwholewheatinthisrecipe– theloafwillturnoutverydense
Addthe wetingredients– theeggmixturetotheflourmixtureandstirwitharubberspatulauntiljustcombined
Ifyou’reinneedofaquick, easyanddelicious waytouseuptheripebabanas inyourhouse…definitely
Task:becomeacellphonedealer
However, thecellphoneprovidermayplacerestrictionsonthemannerinwhichyoucanuseitscompanyname,phonebrandsandimages
Visit thestate’sbusiness licensingagency’swebsiteandyourcity’s occupational/business licensingdepartment’swebsitetodetermineifyouneedalicenseforyourprepaidcellphonebusiness
AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)
30
![Page 31: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/31.jpg)
AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)
• End-to-endevaluation– Automaticapproachperformsworththanmanualcuration in
buildinganewPKBfromscratch.– Butstilldiscoverrelevantsubtasksthatarenotcoveredinthe
currentPKB,whichdeliversthefreshestinformationthatishardlyaddedandupdatedinstantlyinamanualprocess.
.0997 .0995 .0527.2046 .2041 .1331
.9677 .9515 .9404
.0000
.2000
.4000
.6000
.8000
1.0000
MacroP MicroP MAP
Proposed SummaryGeneration Proposed ExplanationGeneration wikiHow
31
![Page 32: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/32.jpg)
Outline
• Background• ProblemDefinition• ProposedApproach• Experiment• Conclusion
32
![Page 33: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/33.jpg)
Conclusion
• Investigatedtwoproblems– Searchtasksuggestionusingproceduralknowledge– Automaticproceduralknowledgebaseconstructionfromsearch
activities• Proposedtocreateathree-wayparallelcorpusofqueries,query
contexts,andtaskdescriptions.• AppliedCRF-basedsequencelabelingmodelsforquery
constructionandtaskdescriptiongeneration.• Futurework
– Userstudy– Jointranking– APKBCusinganaturallanguagegenerationapproach
33
![Page 34: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/34.jpg)
Thanks!Questions?
http://github.com/ziy/pkb
Code&Resources
AnsweringTask-OrientedQuestionsfromtheWebWebQA Workshop,Thursday11am
RelatedWorkshopTalk
ZiYangLanguageTechnologiesInstituteSchoolofComputerScienceCarnegieMellonUniversityziy@cs.cmu.edu
Contact
TravelissponsoredbySIGIRStudentTravelGrant!
Acknowledgement
![Page 35: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/35.jpg)
ParallelCorpusConstructionResult
• Relatedquery tosubtask mapping– Identified1,182query-taskpairsusingexactmatching.
• Task tocontextmapping– Selectedthesentences thatcontainallthetokensinthetask
summaryand70%+ofthetokensinthetask explanation.– Annotatedtheminimalspanthatcontainsthoseoverlapping
tokens.
35
![Page 36: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/36.jpg)
HowDoSearchEnginesandUsersResponsetoTask-OrientedQueries?
• Thenumber(andpercentage)ofsuggestedqueries(orqueriesissuedinthesamesession)thatarementionedwithinthedescriptionofsomesubtask.– “NewWords”:E.g.slimdown->slimdowndiet– Lowqualitymaybeduetoanover-simplifiedsessiondetectionmethod
0
0.2
0.4
0.6
0.8
Fullphrase Newwords
Averagednumber
0246810
Fullphrase Newwords
Percentage(%)
Bing
Log
36
![Page 37: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/37.jpg)
SearchTaskSuggestion
Givenatask-orientedsearchtaskrepresentedbyqueryq(a)Identifytask
– RetrievealistofcandidatetasksfromPKBthatmentionthequeryq ineitherthesummaryorexplanation.
– Selectthetaskt thatmaximizesthelikelihoodofeachcandidateoccurrence,i.e.p(yt=BQIQ…IQ|xt;MQ).
(b)Retrievesubtasks– Retrieve the first-level subtasks s1, …, sn of task t.
(c)Suggestandcreatesubsearchtask– Extract query candidates for each subtask si usingMQ again.– Rankbyp(ysi=BQIQ…IQ|xsi;MQ).
37
![Page 38: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/38.jpg)
AutomaticProceduralKnowledgeBaseConstruction
Givenataskt,(a)Identifysearchtask
– ApplyMQ toextractatask-orientedsearchqueryq.(b)Collectrelatedsearchtasks
– Identifythequeriespi relatedto q inbothsearchlogsandsuggestedqueries.
(c)Identifyandgeneratesubtasks– Extract relevantdocumentsnippets for each relatedquerypi
fromsearchengines.– ApplyMTS/Etoextracttask summaryandexplanation.
38
Searchenginesareabletocorrectlysuggestrelatedtaskstotheuser,ratherthanrelatedentitiesorattributes.
Searchlogsrevealhowaspecificuserworkstoaccomplishatask.
![Page 39: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/39.jpg)
DataPreparation
• EnglishwikiHowdatadump– UsedamodifiedversionofWikiTeam tool.– Obtained149,975articlesthatarenon-redirect,innamespace
“0”,non-stub,with“Introduction”and“Steps”.– CreatedaPKBof1,488,587tasks,1,439,217relations.
• AOLsearchquerylog– 21M(10Munique)queriesintotal.– Afterdowncaseandremovenon-alphanumericcharacters,639
uniquequeriesmatch619tasksummariesafterwhitespaceandpunctuationmarksignored.
– Identified33,548relatedquerycandidatesbycollectingthequeriesthatwereissuedbythesameuserwithin30minutesafterissuedeachthematchingquery.
39
![Page 40: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/40.jpg)
DataPreparation(cont’d)
• Queriessuggestedbysearchengines– Randomlysampled1,000non-primitivetasksfromPKBthatdo
notappearinthequerylog.– Collected9,906relatedqueriessuggestedbyGoogle(avg.6.11,
max.8)and9,715(avg.5.99,max.13)relatedqueriessuggestedbyBingforthe1,639queries.
• Contextextractedfromsearchengines– ExtractedURLsfromGoogle’sfirstsearchresultpageand
excludedwikihow.comdomain(forgeneralizability),google.comdomain,URLsthathavenosubpaths (navigationalsearchresults),anddownloaded7,440contextdocuments.
– UsedBoilerpipe toextract7,437documentsascontexts,andadditional3,512documentsforend-to-endevaluation.
40
![Page 41: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/41.jpg)
SearchTaskSuggestionResult(cont’d)
• 5mostcontributingnon-wordfeatures– Queryphrasesaremorelikelyextractedfromthesummarypart
ofadescriptionduetoitsclarityandconciseness.– Singularnounsandverbsareindicatorstobeginaquery.– Verbphraseisusedtodecidewhethertocontinueaquery.
O à BQ BQ à IQ IQ à IQ
1 POS:NNP POS-1:VB LOC:sum
2 LOC: sum LOC:sum POS-1: IN
3 DEP:ccomp POS-1:VBP VP
4 POS: VB POS-1:NNP DEP:dobj
5 DEP:nsubjpass POS-1:NN POS+1:JJ
41
![Page 42: Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected](https://reader036.vdocuments.us/reader036/viewer/2022081601/610fb88bb4154a4e982bfa82/html5/thumbnails/42.jpg)
AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)
• 5mostcontributingnon-wordfeatures– Nounsandverbsarecrucialforconstructiontaskdescription.– Verbsaremorepreferredtobeginthesummarythannouns.– Tobeginanexplanation,itprefersthe“begin” ofasentence
and/adependencylabelofnsubj.– Verbphrasesarealsoimportant.
Summary Explanation
O à BTS BTS à ITS ITS à ITS O à BTE BTE à ITE ITE à ITE
1 POS:VB POS-1:VB POS-1:VBP Begin VP POS-1:NN
2 POS: VBP POS-1:VBP POS:NNP POS:VBG POS-1:NN VP
3 POS:NN POS-1:NNP POS-1: IN POS:NN POS-1:DT POS-1:NNS
4 DEP: appos POS-1:NN DEP:xcomp DEP:compound
NP POS-1: ,
5 POS:NNP DEP:case POS: JJR DEP:nsubj POS-1:VB POS-1:NNP42