understanding short texts - part ii: explicit...

130
Part II: Explicit Representation for Short Text Understanding Zhongyuan Wang (Microsoft Research) Haixun Wang (Facebook Inc.) Tutorial Website: http://www.wangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Part II Explicit Representation for Short Text Understanding

Zhongyuan Wang (Microsoft Research)

Haixun Wang (Facebook Inc)

Tutorial WebsitehttpwwwwangzhongyuancomtutorialACL2016Understanding-Short-Texts

What is explicit understanding

Add Common Sense to Computing

25 Oct 1881Pablo Picasso Spanish

Which is ldquokikirdquo and which is ldquoboubardquo

sound shape

prime119948ത119942119948ത119942

zigzaggedness

China India

country

Brazil

emerging market

engineer apple

IT company

The is eating an

fruit

body taste

wine

smell

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

httplinghublider-projecteullod-cloud

Linguistic Linked Open Data Cloud

Cyc

short text understanding

(internal representation)

answer

knowledge knowledge

1 ldquoPython Tutorialrdquo2 ldquoWho was the US President when the Angels won the

World Seriesrdquo

linguistic common sense knowledge

Encyclopedia knowledge

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 2: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

What is explicit understanding

Add Common Sense to Computing

25 Oct 1881Pablo Picasso Spanish

Which is ldquokikirdquo and which is ldquoboubardquo

sound shape

prime119948ത119942119948ത119942

zigzaggedness

China India

country

Brazil

emerging market

engineer apple

IT company

The is eating an

fruit

body taste

wine

smell

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

httplinghublider-projecteullod-cloud

Linguistic Linked Open Data Cloud

Cyc

short text understanding

(internal representation)

answer

knowledge knowledge

1 ldquoPython Tutorialrdquo2 ldquoWho was the US President when the Angels won the

World Seriesrdquo

linguistic common sense knowledge

Encyclopedia knowledge

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 3: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Add Common Sense to Computing

25 Oct 1881Pablo Picasso Spanish

Which is ldquokikirdquo and which is ldquoboubardquo

sound shape

prime119948ത119942119948ത119942

zigzaggedness

China India

country

Brazil

emerging market

engineer apple

IT company

The is eating an

fruit

body taste

wine

smell

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

httplinghublider-projecteullod-cloud

Linguistic Linked Open Data Cloud

Cyc

short text understanding

(internal representation)

answer

knowledge knowledge

1 ldquoPython Tutorialrdquo2 ldquoWho was the US President when the Angels won the

World Seriesrdquo

linguistic common sense knowledge

Encyclopedia knowledge

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 4: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Which is ldquokikirdquo and which is ldquoboubardquo

sound shape

prime119948ത119942119948ത119942

zigzaggedness

China India

country

Brazil

emerging market

engineer apple

IT company

The is eating an

fruit

body taste

wine

smell

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

httplinghublider-projecteullod-cloud

Linguistic Linked Open Data Cloud

Cyc

short text understanding

(internal representation)

answer

knowledge knowledge

1 ldquoPython Tutorialrdquo2 ldquoWho was the US President when the Angels won the

World Seriesrdquo

linguistic common sense knowledge

Encyclopedia knowledge

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 5: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

sound shape

prime119948ത119942119948ത119942

zigzaggedness

China India

country

Brazil

emerging market

engineer apple

IT company

The is eating an

fruit

body taste

wine

smell

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

httplinghublider-projecteullod-cloud

Linguistic Linked Open Data Cloud

Cyc

short text understanding

(internal representation)

answer

knowledge knowledge

1 ldquoPython Tutorialrdquo2 ldquoWho was the US President when the Angels won the

World Seriesrdquo

linguistic common sense knowledge

Encyclopedia knowledge

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 6: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

China India

country

Brazil

emerging market

engineer apple

IT company

The is eating an

fruit

body taste

wine

smell

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

httplinghublider-projecteullod-cloud

Linguistic Linked Open Data Cloud

Cyc

short text understanding

(internal representation)

answer

knowledge knowledge

1 ldquoPython Tutorialrdquo2 ldquoWho was the US President when the Angels won the

World Seriesrdquo

linguistic common sense knowledge

Encyclopedia knowledge

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 7: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

engineer apple

IT company

The is eating an

fruit

body taste

wine

smell

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

httplinghublider-projecteullod-cloud

Linguistic Linked Open Data Cloud

Cyc

short text understanding

(internal representation)

answer

knowledge knowledge

1 ldquoPython Tutorialrdquo2 ldquoWho was the US President when the Angels won the

World Seriesrdquo

linguistic common sense knowledge

Encyclopedia knowledge

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 8: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

body taste

wine

smell

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

httplinghublider-projecteullod-cloud

Linguistic Linked Open Data Cloud

Cyc

short text understanding

(internal representation)

answer

knowledge knowledge

1 ldquoPython Tutorialrdquo2 ldquoWho was the US President when the Angels won the

World Seriesrdquo

linguistic common sense knowledge

Encyclopedia knowledge

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 9: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

httplinghublider-projecteullod-cloud

Linguistic Linked Open Data Cloud

Cyc

short text understanding

(internal representation)

answer

knowledge knowledge

1 ldquoPython Tutorialrdquo2 ldquoWho was the US President when the Angels won the

World Seriesrdquo

linguistic common sense knowledge

Encyclopedia knowledge

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 10: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

httplinghublider-projecteullod-cloud

Linguistic Linked Open Data Cloud

Cyc

short text understanding

(internal representation)

answer

knowledge knowledge

1 ldquoPython Tutorialrdquo2 ldquoWho was the US President when the Angels won the

World Seriesrdquo

linguistic common sense knowledge

Encyclopedia knowledge

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 11: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

short text understanding

(internal representation)

answer

knowledge knowledge

1 ldquoPython Tutorialrdquo2 ldquoWho was the US President when the Angels won the

World Seriesrdquo

linguistic common sense knowledge

Encyclopedia knowledge

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 12: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Common Sense Knowledge vs Encyclopedia Knowledge

Common Sense Knowledge Base

Encyclopedia Knowledge Base

Common senselinguistic knowledge among terms

EntitiesFacts

isAisPropertyOf

co-occurrencehellip

DayOfBirthLocatedInSpouseOf

hellip

Typicality basic level of categorization

Black or WhitePrecision

WordNet KnowItAll NELLProbase hellip

Freebase Yago DBPedia Google knowledge graph hellip

Special cases

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 13: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

WordNet [Stark et al 1998]

bull WordNetreg is a large lexical database of English Nouns verbs adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing a distinct concept

bull S (n) China Peoples Republic of China mainland China Communist China Red China PRC Cathay (a communist nation that covers a vast territory in eastern Asia the most populous country in the world)

bull The project began in the Princeton University Department of Psychology and is currently housed in the Department of Computer Science

bull Homepage httpwordnetprincetoneduwordnetabout-wordnetbull Download httpwordnetprincetoneduwordnetdownload

Brief Introduction

Statistics

Sample

Authors

URLs

POS Unique Synsets TotalStrings Word-Sense Pairs

Noun 117798 82115 146312Verb 11529 13767 25047Adjective 21479 18156 30002Adverb 4481 3621 5580Totals 155287 117659 206941

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 14: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

KnowItAll Extract high-quality knowledgefrom the Web [Banko et al 2007 Etzioni et al 2011]

bull OpenIE distills semantic relations fromWeb-scale natural language texts

bull TextRunner -gt ReVerb -gt Open IE part of KnowItAll

bull Yielding over 5 billion extraction from over a billion web pages

bull From ldquoUS president Barack Obama gave his inaugural address on January 20 2013rdquoTo (Barack Obama is president of US)

(Barack Obama gave [his inaugural address on January 20 2013])

bull OpenIE v413 has been released

bull Turing Center at the University of Washington

bull httpopenieallenaiorgbull httpreverbcswashingtonedu

Brief Introduction

Statistics

Sample

News

Authors

URLs

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 15: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

NELL Never-Ending Language Learning [Carlson et al 2010]

bull NELL is a research project that attempts to create a computer system that learns over time to read the web Since January 2010

bull Over 50 million candidate beliefs by reading the web They are considered at different levels of confidence

bull Out of 50 million high confidence in 2817156 beliefs

Brief Introduction

Statistics

Sample

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 16: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

NELL Never-Ending Language Learning

bull It is continually learning facts on web Resources is publicly available

bull NELL research team at CMU

bull Homepage httprtwmlcmuedurtwbull Download httprtwmlcmuedurtwresources

News

Authors

URLs

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 17: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Probase [Wu et al 2012]

bull Probase is a semantic network to make machines ldquoawarerdquo of the mental world of human beings so that machines can better understand human communication

Brief Introduction

Probase network

isA isPropertyOf Co-occurrence(concept entities) (attributes) (isCEOof LocatedIn etc)

Concepts Entities(ldquoSpanish Artistsrdquo) (ldquoPablo Picasordquo)

Nodes

Edges

Attributes(ldquoBirthdayrdquo)

VerbsAdjectives(ldquoEatrdquo ldquoSweetrdquo)

bull 5401933 unique concepts bull 12551613 unique instancesbull 87603947 IsA relations

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 18: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

countries Basic watercolor techniques

Celebrity wedding dress designers

Probase

bull Microsoft Research

bull Public release coming soon in AugSept 2016 bull Project homepage httpresearchmicrosoftcomprobase

Concepts

Authors

URLs

Probase isA error rate lt1 1 and lt10 for random pair

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 19: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Freebase [Bollacker et al 2008]

bull Freebase is a well-known collaborative knowledge base consisting of data composed mainly by its community

bull Freebase contains more than 23 million entitiesbull Freebase contains 19 billion triplesbull Each triple is organized as form of

ltsubjectgt ltpredicategt ltobjectgt

Brief Introduction

Statistics

bull Freebase is a collection of factsbull Freebase only contains nodes

and linksbull Freebase is a labeled graph

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 20: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Freebase -gt Wiki Data

bull Freebase data was integrated into Wikidatabull The Freebase API will be completely shut-down on Aug 31 2016

replaced by Google Knowledge Graph API

bull Freebase Community

bull Homepage httpwikifreebasecomwikiMain_Pagebull Download httpsdevelopersgooglecomfreebasebull Wikidata httpswwwwikidataorg

News

Authors

URLs

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 21: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Google Knowledge Graph

bull Knowledge Graph is a knowledge base used by Google to enhance its search engines search results with semantic-search information gathered from a wide variety of sources

bull 570 million objects and more than 18 billion facts about relationshipsbetween different objects

bull Google Inc

bull Homepage httpswwwgooglecomintles419insidesearchfeaturessearchknowledgehtml

Brief Introduction

Statistics

Sample

Authors

URLs

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 22: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

YAGO [Suchanek et al 2007]

bull YAGO is a huge semantic knowledge base derived from GeoNames WordNet and Wikipedia (10 Wikipedias in different languages)

bull More than 10 million entities(persons organizations cities etc)bull More than 120 million facts about entitiesbull More than 35000 classes assigned to entitiesbull Many of its facts and entities are attached a temporal dimension and a spatial dimension

Brief Introduction

SampleltAlbert_Einsteingt ltisMarriedTogt ltElsa_Einsteingt

Statistics

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 23: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

YAGO

Newsbull An evaluated version of YAGO3 (Combining information from Wikipedia from different

languages) is released [15 Sep 2015]

Authorsbull Max Planck Institute for Informatics in SaarbruumlckenGermany and DBWeb group at Teacuteleacutecom ParisTech University

URLsbull Homepage httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagobull Download httpwwwmpi-infmpgdedepartmentsdatabases-and-

information-systemsresearchyago-nagayagodownloads

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 24: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 25: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 26: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

If the short text is a single instancehellip

bull Pythonbull Microsoftbull Applebull hellip

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 27: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Single Instance Understanding

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 28: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Word Ambiguity bull Word sense disambiguation rely on dictionaries

(WordNet)

Take a seat on this chair

The chair of the Math Department

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 29: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Instance Ambiguity

bull Instance sense disambiguation extra knowledge needed

I have an apple pie for lunch

He bought an apple ipad

Here ldquoapplerdquo is a proper noun

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 30: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Ambiguity [Hua et al 2016]

bull Many instances are ambiguous

bull Intuition ambiguous instances have multiple senses

short text instance sense

population china china country

glass vs china china fragile item

pear apple apple fruit

microsoft apple apple company

read harry potter harry potter book

watch harry potter harry potter movie

age of harry potter harry potter character

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 31: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Pre-definition for Ambiguity (1) Sense [Hua et al 2016]

bull What is a Sense in semantic networksbull A sense as a hierarchy of concept clusters

region

country state city

creature

animal

predator

crop food

fruit vegetable meat

Germany

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 32: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Pre-definition for Ambiguity (2) Concept Cluster [Li et al 2013 Li et al 2015]

bull What is a Concept Cluster (CL)bull Cluster similar concepts into a concept cluster using K-

Means like approach (k-Medoids)

FruitFresh fruit

JuiceTropical fruit

BerryExotic fruit

Seasonal fruitFruit juiceCitrus fruitSoft fruitDry fruit

Wild fruitLocal fruit

hellip

company

CompanyClientFirm

ManufacturerCorporation

large companyRivalGiant

big companylocal company

large corporationinternational

companyhellip Fruit

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 33: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Definitions of Instance Ambiguity [Hua et al 2016]

bull 3 levels of instance ambiguitybull Level 0 unambiguous

bull Contains only 1 sensebull Eg dog (animal) beijing (city) potato (vegetable)

bull Level 1 unambiguous and ambiguous both make sensebull Contains 2 or more senses but these senses are relatedbull Eg google (company amp search engine) french (language amp

country) truck(vehicle amp public transport service)

bull Level 2 ambiguous bull Contains 2 or more senses and the senses are very different from

each otherbull Eg apple (fruit amp company) jaguar(animal amp company) python

(animal amp language)

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 34: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Ambiguity Score

bull Using top-2 senses to calculate the ambiguity score

119904119888119900119903119890 =

0 119897119890119907119890119897 = 0119908 1199042 119890

119908 1199041 119890lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041 1199042 119897119890119907119890119897 = 1

score = 1 +119908(1199041198882|119890)

119908(1199041198881|119890)lowast 1 minus 119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 119897119890119907119890119897 = 2

Denote top-2 senses as 1199041 and 1199042 top-2 sense clusters as 1199041198881 and 1199041198882 Denote similarity of two sense clusters as the maximum similarity of their senses

119904119894119898119894119897119886119903119894119905119910 1199041198881 1199041198882 = 119950119938119961119904119894119898119894119897119886119903119894119905119910(119904119894 isin 1199041198881 119904119895 isin 1199041198882) For an entity 119890 denote the weight (popularity) of a sense 119904119894 as the sum of weights of its concept clusters

119908 119904119894|119890 = 119908 119867119894|119890 =119862119871119895isin119867119894

119875(119862119871119895|119890)

For an entity 119890 denote the weight (popularity) of a sense cluster 119904119888119894 as the sum of weights of its senses

119908 119904119888119894 119890 =119904119895isin119904119888119894

119908(119904119895|119890)

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 35: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Examples

bull Level 0bull california

bull country state city region institution 0943bull fruit

bull food product snack carbs crop 0827bull alcohol

bull substance drug solvent food addiction 0523bull computer

bull device product electronics technology appliance 0537bull coffee

bull beverage product food crop stimulant 073bull potato

bull vegetable food crop carbs product 0896bull bean

bull food vegetable crop legume carbs 0801

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 36: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Examples (cont)bull Level 1

bull nike score = 0034bull company store 0861bull brand 0035bull shoe product 0033

bull twitter score = 0035bull website tool 0612bull network 0165bull application 0033bull company 0031

bull facebook score = 0037bull website tool 0595bull network 017bull company 0053bull application 0029

bull yahoo score = 038bull search engine 0457bull company provider account 0281bull website 00656

bull google score = 0507bull search engine 046bull company provider organization 0377bull website 00449

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 37: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Examples (cont)

bull Level 2bull jordan score = 102

bull country state company regime 092bull shoe 002

bull fox score = 109bull animal predator species 074bull network 0064bull company 0035

bull puma score = 115bull brand company shoe 0655bull species cat 0116

bull gold score = 121bull metal material mineral resource mineral062bull color 0128

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 38: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Examples (cont)

bull Level 2bull soap score = 122

bull product toiletry substance 049bull technology industry standard 011

bull silver score = 124bull metal material mineral resource mineral 0638bull color 0156

bull python score = 129bull language 0667bull snake animal reptile skin 0193

bull apple score = 141bull fruit food tree 0537bull company brand 0271

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 39: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 40: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

A Concept View of ldquoMicrosoftrdquo

company

largest desktop OS vendor

softwarecompany

international company

technology leader

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 41: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Microsoft

largest desktop OS vendorcompany hellip hellip

software company

Basic-level Conceptualization (BLC)[Rosch et al 1976]

KFC

BMW

Basic-level conceptualization

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 42: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

How to Make BLC

bull Naive approachesbull Typicality an important measure for understanding the

relationship between an object and its concept

bull Pointwise Mutual Information (PMI) a common measure of the strength of association between two terms

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 43: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

bird

Naive Approach 1 Typicality

P(robin|bird) gt P(penguin|bird)ldquorobinrdquo is a more typical bird than a ldquopenguinrdquo

country

SeychellesUSA

P(USA|country) gt P(Seychelles|country)ldquoUSArdquo is a more typical country than ldquoSeychellesrdquo

penguinrobin

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 44: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Using Typicality for BLC

bull Associate each isA relationship (119890 is 119888) with typicality scores 119875 119890 119888 and 119875 119888 119890

119875 119890 119888 =119899 119888 119890

119899 119888119875(119888|119890) =

119899 119888 119890

119899(119890)

bull P(e|c) indicates how typical (or popular) e is in the given concept c

bull P(c|e) indicates how typical (or popular) the concept c is given e

bull However

Microsoft

largest desktop OS vendorcompanyhigh typicality p(c|e) high typicality p(e|c)

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 45: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Naive Approach 2 PMI[Manning and Schutze 1999]

bull Pointwise mutual information (PMI) is a measure of association used in information theory and statistics

bull Consider using the PMI between concept c and instance e to find the basic-level concepts as follows

119875119872119868(119890 119888) = log119875(119890 119888)

119875(119890)119875(119888)= log119875(119890|119888) minus log119875(119890)

bull However bull In basic level of categorization we are interested in finding a

concept for a given e which means P(e) is a constant

bull Thus ranking by PMI(e c) is the same as ranking by P(e|c)

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 46: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Using Rep(e c) for BLC [Wang et al 2015b]

bull The measure 119877119890119901 119890 119888 = 119875(119888|119890) lowast 119875(119890|119888) means

bull (With PMI) If we take the logarithm of our scoring function we get

log119877119890119901 119890 119888 = log119875 119888 119890 lowast 119875(119890|119888) = log119875(119890 119888)

119875(119890)lowast119875(119890 119888)

119875(119888)= log

119875(119890 119888)2

119875(119890)119875(119888)= 119875119872119868 119890 119888 + log119875 119890 119888

= 1198751198721198682

bull (With Commute Time) The commute time between an instance e and a concept c is

119879119894119898119890(119890 119888) =

119896=1

infin

(2119896) lowast 119875119896(119890 119888) =

119896=1

119879

2119896 lowast 119875119896 119890 119888 +

119896=119879+1

infin

2119896 lowast 119875119896 119890 119888

ge σ119896=1119879 (2119896) lowast 119875119896(119890 119888) + 2(119879 + 1) lowast (1 minus σ119896=1

119879 119875119896(119890 119888)) = 4 minus 2 lowast 119877119890119901(119890 119888)

Given e the c should be its typical concept (shortest distance)

Given c the e should be its typical instance (shortest distance)

A process of finding concept nodes having shortest expected distance with e

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 47: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

PrecisionNDCGNo smoothing 1 2 3 5 10 15 20

MI(e) 0769 0692 0705 0685 0719 0705 0690

PMI3(e) 0885 0769 0756 0800 0754 0733 0721

NPMI(e) 0692 0692 0667 0638 0627 0610 0610

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0462 0526 0523 0523 0510 0521

Rep(e) 0846 0865 0872 0862 0758 0731 0719

Smoothing=0001

MI(e) 0577 0615 0628 0600 0612 0605 0592

PMI3(e) 0731 0673 0692 0654 0669 0644 0623

NPMI(e) 0923 0827 0769 0746 0731 0695 0671

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0554

Typicality P(e|c) 0885 0865 0872 0831 0785 0741 0704

Rep(e) 0846 0731 0718 0723 0700 0669 0638

Smoothing=00001

MI(e) 0615 0615 0654 0608 0635 0628 0612

PMI3(e) 0846 0731 0731 0715 0723 0685 0677

NPMI(e) 0885 0904 0885 0869 0823 0777 0752

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0885 0904 0910 0877 0831 0813 0777

Rep(e) 0923 0846 0833 0815 0781 0736 0719

Smoothing=1e-5

MI(e) 0615 0635 0667 0662 0677 0656 0646

PMI3(e) 0885 0769 0744 0777 0758 0731 0710

NPMI(e) 0885 0846 0872 0869 0831 0810 0787

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0769 0808 0846 0823 0808 0782 0765

Rep(e) 0885 0904 0872 0862 0812 0800 0767

Smoothing=1e-6

MI(e) 0769 0673 0705 0677 0700 0692 0679

PMI3(e) 0885 0769 0756 0785 0773 0726 0723

NPMI(e) 0885 0846 0821 0815 0750 0726 0719

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0538 0615 0615 0615 0608 0613 0615

Rep(e) 0846 0885 0897 0877 0788 0777 0765

Smoothing=1e-7

MI(e) 0769 0692 0705 0685 0719 0703 0688

PMI3(e) 0885 0769 0756 0792 0758 0736 0725

NPMI(e) 0769 0750 0718 0700 0650 0641 0633

Typicality P(c|e) 0462 0577 0603 0577 0569 0564 0556

Typicality P(e|c) 0500 0481 0526 0523 0531 0523 0523

Rep(e) 0846 0865 0872 0854 0765 0749 0733

No Smoothing 1 2 3 5 10 15 20

MI(e) 0516 0531 0519 0531 0562 0574 0594

PMI3(e) 0725 0664 0652 0660 0628 0631 0646

NPMI(e) 0599 0597 0579 0554 0540 0539 0549

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0401 0386 0396 0398 0401 0410 0428

Rep(e) 0758 0771 0745 0723 0656 0647 0661

Smoothing=1e-3

MI(e) 0374 0414 0441 0448 0473 0481 0495

PMI3(e) 0484 0511 0509 0502 0519 0525 0533

NPMI(e) 0692 0652 0607 0603 0585 0585 0592

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0460

Typicality P(e|c) 0703 0697 0704 0681 0637 0628 0626

Rep(e) 0621 0580 0554 0561 0554 0555 0559

Smoothing=1e-4

MI(e) 0407 0430 0458 0462 0492 0503 0512

PMI3(e) 0648 0604 0579 0575 0578 0576 0590

NPMI(e) 0747 0777 0761 0737 0700 0685 0688

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0791 0795 0802 0767 0738 0729 0724

Rep(e) 0758 0714 0711 0689 0653 0636 0653

Smoothing=1e-5

MI(e) 0429 0465 0478 0501 0517 0528 0545

PMI3(e) 0725 0647 0642 0642 0627 0624 0638

NPMI(e) 0813 0779 0778 0765 0730 0723 0729

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0709 0728 0735 0722 0702 0696 0703

Rep(e) 0791 0787 0762 0739 0707 0703 0706

Smoothing=1e-6

MI(e) 0516 0510 0515 0526 0546 0563 0579

PMI3(e) 0725 0655 0651 0654 0641 0631 0649

NPMI(e) 0791 0766 0732 0728 0673 0659 0668

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0495 0516 0520 0508 0512 0521 0540

Rep(e) 0758 0784 0767 0755 0691 0686 0694

Smoothing=1e-7

MI(e) 0516 0531 0519 0530 0562 0571 0592

PMI3(e) 0725 0664 0652 0658 0630 0631 0647

NPMI(e) 0670 0655 0633 0604 0575 0570 0581

Typicality P(c|e) 0297 0380 0409 0422 0438 0446 0461

Typicality P(e|c) 0423 0421 0415 0407 0414 0424 0438

Rep(e) 0758 0771 0745 0725 0663 0661 0668

Evaluations on Different Measures for BLC

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 48: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Single Instance

bull Is this instance ambiguous

bull What are its basic-level concepts

bull What are its similar instances

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 49: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

What is the Semantic Similaritybull Are the following instance pairs similar

bull ltapple microsoftgt

bull ltapple peargt

bull ltapple fruitgt

bull ltapple foodgt

bull ltapple ipadgt

bull ltcar journeygt

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 50: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Approaches on Term Similarity

bull Categories of approaches for semantic similaritybull String based approach

bull Knowledge based approachbull Use preexisting thesauri taxonomy or encyclopedia such as

WordNet

bull Corpus based approachbull Use contexts of terms extracted from web pages web search

snippets or other text repositories

bull Embedding based approachbull Will introduce in detail in ldquoPart 3 Implicit Understandingrdquo

79

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 51: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Approaches on Term Similarity (2)

bull Categories

80

Knowledge based approaches

(WordNet)

Corpus based

approaches

Path lengthlexical

chain-based

Information

content-based

Graph learning

algorithm basedSnippet search based

Rada

1989

Resnik

1995

Jcn

1997

Lin

1998

Saacutench

2011

Agirre

2010Alvarez

2007

String based

approaches

HunTray

2005

Hirst

1998

Do

2009

Bol

2011Chen

2006

State-of-the-art approaches

Ban

2002

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 52: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

bull Framework

83

Term Similarity Using Semantic Networks [Li et al 2013 Li et al 2015]

Term pairs ltt1 t2gt

Type Checking

Concept Pairs Entity Pairs

Entity-distribution Context Collection

Concept-distribution Context Collection

Concept-Entity Pairs

Concept Collection for the Entity Term t1

Similarity EvaluationCosine(T(t1) T(t2))

for each pairltt2cxgt

Context vector T(t1) and T(t2)

Get maxsim(t2cx) for ltt1 t2gt

End

End

Concept Clustering

Cluster Context vector Cx(t1) and Cy(t2)

Similarity Evaluation

Max(xy) Cosine(Cx(t1) Cy(t2))

End

Concept Clustering

for each Cluster Ci(t1)

Select top k Concept namely cx

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 53: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

An example [Li et al 2013 Li et al 2015]

For example

ltbanana peargt

88

ltbanana peargt

Entity PairsType Checking

Concept Context Collection

Similarity Evaluation Cosine(T(t1) T(t2)) 0916

Step 1 Type Checking

Step 2 Context Representation(Vector)

Step 3 Context Similarity

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 54: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

ExamplesTerm 1 Term 2 Similarity

lunch dinner 09987

tiger jaguar 09792

car plane 09711

television radio 09465

technology company microsoft 08208

high impact sport competitive sport 08155

employer large corporation 05353

fruit green pepper 02949

travel meal 00426

music lunch 00116

alcoholic beverage sports equipment 00314

company table tennis 00003

96httpadaptseieesjtueducnsimilaritySimCompleteResultspdf

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 55: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Statistics of Search Queries

44

29

17

7

2 1

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

10

26

34

19

74

1 Term2 Terms3 Terms4 Terms5 Termsmore than 5 Terms

(a) By traffic

(b) By of distinct queries

Pokeacutemon Go Microsoft HoloLens

Instance 1 Instance 21 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

1 Instance2 Instances3 Instances4 Instances5 InstancesMore than 5 Instances

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 56: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

If the short text has context for the instancehellip

bull python tutorialbull dangerous pythonbull moon earth distancebull hellip

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 57: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 58: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Supervised Segmentation [Bergsma et al 2007]

bull Problem divide query into semantic units

bull Approach turn segmentation into position-based binary classification

Example Query

Two man power saw

[two man] [power saw][two] [man] [power saw][two] [man power] [saw]

Input a query and its positions

Output the decision for making segmentation at each position

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 59: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Supervised Segmentation

bull Featuresbull Decision boundary features

bull Statistical features

bull Context features

bull Dependency features

eg Indicators thePOS tags in query isPosition features forwardbackward

Mutual information between left and right parts

Bank loan amortization schedule

Context information

bus driverfemale

depend

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 60: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Supervised Segmentation

bull Segmentation Overview

saw

SVMclassifier

Input query two man power saw

two man power

Output segmentation decision for each position (yesno)

learning features

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 61: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Unsupervised Segmentation [Tan et al 2008]

bull Unsupervised learning for query segmentation

Probability of generated segmentation S for query Q

119875 119878119876 = 119875 1199041 P 1199042|1199041 hellipP 119904119898 11990411199042hellip119904119898minus1

asympෑ

119904119894isin119878

119875(119904119894)Unigram model

segments

Valid segment boundary if and only if the pointwise mutual information between the two segments resulting from the split is negative

new york times subscription

1199041 1199042

119872119868 119904119896 119904119896+1 = log119875119888([119904119896 119904119896+1])

119875119888 119904119896 ∙ 119875119888 (119904119896+1)lt 0

Example log119875119888([119899119890119908 119910119900119903119896])

119875119888( 119899119890119908) ∙ 119875119888 (119910119900119903119896)gt 0

no segment boundary here

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 62: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Unsupervised Segmentation

bull Find top k segmentations dynamic programming

bull Using EM optimization on the fly

Input query 11990811199082hellip119908119899 concept probability distribution

Output top k segmentations with highest likehood

Words in a query

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 63: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Exploit Click-through [Li et al 2011]

bull Motivationbull Probabilistic query segmentation

bull Use click-through data

Output top-3 segmentation

[bank of america] [online banking] 0502

bank of america online banking] 0428

[bank of ] [ america] [online banking] 0001

Q -gt URL -gt D query

document

click data

Input Query bank of america online banking

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 64: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Exploit Click-through

bull Segmentation Model

An interpolated model

global info Click-throughinfo

[credit card] [bank of America]

1 bank of america credit cards contact us overview2 secured visa credit card from bank of america3 credit cards overview find the right bank of america credit card for you

Query

Clicked html documents

global info

Click-through info

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 65: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 66: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Sense Changes with Different Context

watch harry potter read harry potter age harry potter

Movie Book Character

harry potter walkthrough

Game

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 67: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Entity Recognition in Query [Guo et al 2009]

bull Motivation

Detect named entity in a short text and categorize it

harry potter walkthrough

Single-named-entity query

Example

(ldquoharry potterrdquo ldquo walkthroughrdquo ldquogamerdquo)

triple lte t cgt

class of entity

context terms

ambiguous term

contextterm class

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 68: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Entity Recognition in Query

bull Probabilistic Generative Model

Goal Given a query q find triple lte t cgt maximize the probability

Probability to generate triple

assume context only depends on class

Objective given query q find

The problem then becomes how to estimate Pr(e) Pr(c|e) and Pr(t|c)

Eg ldquowalkthroughrdquo only depends on game instead of happy potter

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 69: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Entity Recognition in Query

bull Probability Estimation by Learning

learning objective

N

1i

iii )ctP(emax

Challenge difficult as well as time consuming to manually assign class labels to named entities in queries

Build training set 119879 = (119890119894 119905119894) view 119888119894 as a hidden variable

New Learning problem

N

1i

ii

N

1i

i

N

1i

ii c)|)P(te|P(c)P(emax)tP(emax c

solved with topic model WS-LDA

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 70: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Signal from Click [Pantel et al 2012]

bull Motivation

Predict entity type in Web search

entity

user intent

context

click

Query type distribution (73 types)

Generative model

entity type

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 71: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

T

TK

K2

Signal from Click

bull Joint Model for Prediction

t

τ

i

n c

θ

φ

ω

Q

Distribution over types

Intent distribution

Pick type

Pick entity

Pick intent

Pick click

Word distribution

Host distribution

Entity distribution

For each Query

Pick context words

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 72: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Telegraphic Query interpretation [Sawant et al 2013 Joshi et al 2014]

bull Entity-seeking Telegraphic Queries

bull Interpretation = Segmentation + Annotation

Knowledge base Large corpus

accuracy recall

Germany capital

Berlin

Query

Result Entity

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 73: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

bull Overview

Joint Interpretation and Ranking [Sawant et al 2013 Joshi et al 2014]

Annotated Corpus

Telegraphic Query

e1e2e3

Two Models for Interpretation and Ranking

Generative Model

Discriminative Model

Output

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 74: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

bull Generative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San Diego Padres

Major league baseball team

type context

E

T Padres have been to two World

Series losing in 1984 and 1998

Type hint

baseball team

losing team baseball world series 1998

Z

Context matchers

lost 1998 world seriesswitch

model model

q losing team baseball world series 1998

Borrow from U Sawant (2013)

Based on Probabilistic Language Models

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 75: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

bull Discriminative Model

Joint Interpretation and Ranking [Sawant et al 2013]

San_Diego_Padres

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(baseball team)

losing team baseball world

series 1998

(t = baseball team)

1998_World_Series

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(series)

losing team baseball world

series 1998

(t = series)

Correct entity Incorrect entity

Based on max-margin discriminative learning

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 76: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

bull Queries seek answer entities (e2)

bull Contain (query) entities (e1) target types (t2) relations (r) and selectors (s)

Telegraphic Query Interpretation [Joshi et al 2014]

query e1 r t2 s

dave navarro first band

dave navarro band band first

dave navarro - band first

spider automobile company

spider automobile company

automobile company

-

automobile company company spider

Borrow from M Joshi (2014)

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 77: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Improved Generative Model

bull Generative Model[Sawant et al 2013]

[Joshi et al 2014]Consider e1

(in q) and r

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 78: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Improved Discriminative Model

bull Discriminative Model[Sawant et al 2013]

[Joshi et al 2014]

Consider e1

(in q) and r

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 79: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Understand Short Texts with A Multi-tiered Model [Hua et al 2015 (ICDE Best Paper)]

bull Input a short text

bull Output semantic interpretation

bull Three steps in understanding a short text

wanna watch eagles band

watch[verb] eagles[entity](band) band[concept]

wanna watch eagles band watch[verb] eagles[entity](band) band[concept]

watch eagles band watch[verb] eagles[entity] band[concept]

Step 1 Text Segmentation ndash divide into a sequence of terms in vocabulary

Step 2 Type detection ndash determine the best type of each term

Step 3 Concept Labeling ndash infer the best concept of each entity within context

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 80: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Text segmentationbull Observations

bull Mutual Exclusion ndash terms containing the same word mutually exclude each other

bull Mutual Reinforcement ndash related terms mutually reinforce each other

bull Build a Candidate Term Graph (CTG)

ldquovacation april in parisrdquo ldquowatch harry potterrdquo

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 81: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Is a segmentation

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 82: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Find best segmentation

bull Best segmentation= sub-graph in CTG whichbull Is a complete graph (clique)

bull No mutual exclusion

bull Has 100 word coveragebull Except for stopwords

bull Has the largest average edge weight

Maximal Clique

Best segmentation

april in paris

vacation

april paris

13

0029

0005

0047

0041

13 13

23 harry potter

watch

harry potter

13

0014

0092

0053

0018

13 13

23

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 83: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Type Detection

bull Pairwise Modelbull Find the best typed-term for each term so that the

Maximum Spanning Tree of the resulting sub-graph between typed-terms has the largest weight

watch[v]

watch[e]

watch[c]

watch

free[adj]

free[v]

movie[c]

movie[e]

free

movie

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 84: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Concept Labeling

bull Entity disambiguation is the most important task of concept labelingbull Filterre-rank of the original concept cluster vector

bull Weighted-Votebull The final score of each concept cluster is a combination

of its original score and the support from context using concept co-occurrence

watch harry potter read harry potter

movie book

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 85: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Example of Entity Disambiguation[Hua et al 2015 (ICDE Best Paper) Hua et al 2016]

Co-occurrence network

Concept Vector

Semantic network

Short Text

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

c1 p1

c2 p2

c3 p3

hellip

ipad apple

fruithellip

companyhellip

foodhellip

producthellip

Is-A

filtering

ldquoipad applerdquo

producthellip

devicehellip

producthellip

brandhellip

companyhellip

devicehellip

co-occur

Is-A

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 86: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Mining Lexical Relationships[Wang et al 2015b]

bull Lexical knowledge represented by the probabilities

verb

product book

movie

watch harry potter

119901 119907119890119903119887 119908119886119905119888ℎ

119901 119894119899119904119905119886119899119888119890 119908119886119905119888ℎ

119901 119898119900119907119894119890 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119898119900119907119894119890 119908119886119905119888ℎ 119907119890119903119887

119901 119887119900119900119896 ℎ119886119903119903119910 119901119900119905119905119890119903

119901 119888 119905 119911

119901 119888 119890 =119901 119888 119905 119911 = 119894119899119904119905119886119899119888119890

119901 119911 119905 ①②

③e instancet termc conceptz role

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 87: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Understanding Queries [Wang et al 2015b]

bull Goal to rank the concepts and findarg max

119888119901(119888|119905 119902)

The offline semantic network

QueryAll possible

segmentations

Random walk with restart [Sun et al 2005]on the online subgraph

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 88: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 89: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Head Modifier and Constraint Detection in Short Texts [Wang et al 2014b]

bull Example ldquopopular smart cover iphone 5srdquo

bull Definition bull Head acts to name the general (semantic) category to which the

whole short text belongs Usually the head is the intent of the short text

bull ldquosmart coverrdquo intent of the query

bull Constraints distinguish this member from other members of the same category

bull ldquoiphone 5srdquo limit the type of the head

bull Non-Constraint Modifiers (aka Pure Modifiers) are subjectivemodifiers which can be dropped without changing intent

bull ldquopopularrdquo subjective can be neglected

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 90: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Non-Constraint Modifiers Mining Construct Modifier Networks

Edges form a Modifier Network

Concept Hierarchy Tree in ldquoCountryrdquo domain

Modifier Network in ldquoCountryrdquo domainIn this case ldquoLargerdquo and ldquoToprdquo are pure modifiers

Country

Asian country

Developed country

Western country

Asian Developed Western

Western developed

country

Top western country

Large

Large Top

Top

WesternLarge Asian

country

Large developed

country

Top developed

country

Country

Asian Western

Developed

Large Top

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 91: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

bull Betweenness centrality is a measure of a nodes centrality in a network

bull Betweennes of node v is defined as

bull where 120590119904119905 is the total number of shortest paths from node s to node t and 120590119904119905(119907) is the number of those paths that pass through v

bull Normalization amp Aggregation

bull For a pure modifier it should have low betweenness centrality aggregation score PMS(t)

Non-Constraint Modifiers Mining Betweenness centrality

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 92: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Head-Constraints Mining [Wang et al 2014b]

bull A term can be a head sometimes and be a constraint in some other cases

bull Eg Seattle hotel Seattle hotel job

head headconstraintconstraintconstraint

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 93: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Head-Constraints Mining Acquiring Concept Patterns

Get entity pairs from query log

Conceptualization

Concept Patterns for each

prepositionsentity1 entity2

Extract Patterns

A for B A of BA with B A in BA on B A at B hellip

entity 1head entity 2constraint

concept11

concept12

concept13

concept14

concept21

concept22

concept23

(concept11 concept21) (concept11 concept22)(concept11 concept23)hellip

Concept Pattern Dictionary

Building concept pattern dictionary Query Logs

cover for iphone 6sbattery for sony a7rwicked on broadway

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 94: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Why Concepts Canrsquot Be Too Generalbull It may cause too many concept pattern conflicts

canrsquot distinguish head and modifier for general concept pairs

Head Modifier

Derived Concept Pattern device company

Supporting Entity Pairs iphone 4 verizon

modem comcast

wireless router comcast

iphone 4 tmobile

Head Modifier

Derived Concept Pattern company device

Supporting Entity Pairs amazon books kindle

netflix touchpad

skype windows phone

netflix ps3

Conflict

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 95: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Why Concepts Canrsquot Be Too Specificbull It may generate concepts with little coverage

bull Concept regresses to entitybull Large storage space up to (million million) patterns

hellip hellip

device largest desktop OS vendor

device largest software development company

device largest global corporation

device latest windows and office provider

hellip hellip

Basic-level Conceptualization (BLC) is a good choice [Wang et al 2015b]

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 96: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Top Concept PatternsCluster size Sum of Cluster Score headconstraintscore

615 2114691 breedstate357298460224501

296 7752357 gameplatform627403476771856

153 3466804 accessoryvehicle53393705094809

70 118259 browserplatform132612807637391

22 1010993 requirementschool271407526294823

34 9489159 drugdisease154602405333541

42 8992995 cosmeticskin condition814659415003929

16 7421599 jobcity27903732555528

32 710403 accessoryphone246513830851194

18 6692376 softwareplatform210126322725878

20 6444603 testdisease239774028397537

27 5994205 clothesbreed98773996282851

19 5913545 penaltycrime200544192793488

25 5848804 taxstate240081818612579

16 5465424 saucemeat183592863621553

18 4809389 credit cardcountry142919087972152

14 4730792 foodholiday14554140330924

11 4536199 modgame257163856882439

29 4350954 garmentsport471533326845442

23 3994886 career informationprofessional732726483731257

15 386065 songinstrument128189481818135

18 378213 baitfish780426514113169

22 3722948 study guidebook508339765053921

19 3408953 pluginsbrowser550326072627126

14 3305753 recipemeat882779863422951

18 3214226 currencycountry110825444188352

13 3180272 lenscamera186081673263957

9 316973 decorationholiday130055844126533

16 314875 foodanimal7338544366514

game platform

game device

video game platform

game console game pad

game gaming platform

Game (Head) Platform (Modifier)

angry birds android

angry birds ios

angry birds windows 10

hellip hellip

Detection

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 97: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Head Modifier Relationship

bull Train a classifier on

(head-embedding modifier-embedding)

bull Training data bull Positive (head modifier)bull Negative (modifier head)

bull Precision gt= 09 Recall gt= 09

bull Disadvantage not interpretable

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 98: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Syntactic Parsing based on HM

bull Information is incompletebull Preposition and other function words

bull Within a noun compound el capitan macbook pro

bull Why not train a parser for web queries

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 99: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Syntactic Parsing of Short Texts[Sun et al EMNLP 2016]

bull Syntactic structures are valuable for short text understanding

bull Examples

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 100: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Challenges Short Texts Lack Grammatical Signalsbull Lack function words word order

bull ldquotoys queriesrdquo has ambiguous intent

bull ldquodistance earth moonrdquo has clear intentbull many equivalent forms ldquoearth moon distancerdquo ldquoearth

distance moonrdquo hellip

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 101: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Challenges Syntactic Parsing of Queries

bullNo standard

bullNo ground-truth

Why is syntactic parsing of queries even a legitimate problem

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 102: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Derive Syntax from Semantics[Sun et al 2016]

bull Query ldquothai food houstonrdquo

bull Clicked sentence

bull Project dependency to the query

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 103: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

A Treebank for Short Texts

bull Given query 119902

bull Given 119902rsquos clicked sentence 119904

bull Parse each 119904

bull Project dependency from 119904 to 119902

bull Aggregate dependencies

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 104: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Algorithm of Projection

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 105: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Result Examples

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 106: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Results

bull Random queries

QueryParser UAS 083 LAS 075Stanford UAS 072 LAS 064

bull Queries with no function words

QueryParser UAS 082 LAS 073Stanford UAS 070 LAS 061

bull Queries with function words

QueryParser UAS 090 LAS 085Stanford UAS 086 LAS 080

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 107: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Short Text Understanding

bull How to segment this short text

bull What does this short text mean (its intent senses or concepts)

bull What are the relations among terms in the short text

bull How to calculate the similarity between short texts

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 108: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

bull Measuring similarity between two short texts and sentences

bull Basic idea word-by-word comparison using embedding vector

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 109: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

bull Use saliency-weighted semantic graph to computer similarity

Short Text Similarity Using Word Embedding[Kenter and Rijke 2015]

Features acquired

Bins of all edges Bins of max edges

119908isin119904119868

119868119863119865(119908) sdot)119904119890119898(119908 119904119904) sdot (1198961 + 1

൰119904119890119898(119908 119904119904) + 1198961 sdot (1 minus 119887 + 119887 sdot|119904119904|119886119907119892119897

Similarity measurement

termShort texts

Inspired by BM25

Semantic similarity

119891119904119905119904(119904119897 119904119904) =

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 110: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

From the Concept View

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 111: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

From the Concept View [Wang et al 2015a]

Co-occurrence Network

Bags of Concepts

Semantic Network

Short Text 1

Short Text 2

Concept Vector 1[(c1 score1) (c2 score2)hellip]

Concept Vector 2[(c1rsquo score1rsquo) (c2rsquo score2rsquo)hellip]

Similarity

Parsing

Term clustering by isA

Concept filtering by co-occurrence

Headmodifier analysis

Concept orthogonalization

Conceptualization

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 112: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Outline

bull Knowledge Bases

bull Explicit Representation Models

bull Applications

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 113: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Applications

bull Explicit short text understanding benefit lot of application scenariosbull Adssearch semantic match

bull Definition mining

bull Query recommendation

bull Web table understanding

bull Semantic search

bull hellip

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 114: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Ads Keyword Selection [Wang et al 2015a]

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 115: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Ads Keyword Selection [Wang et al 2015a]

000

100

200

300

400

500

600

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

000

010

020

030

040

050

060

Decile 4

Decile 5

Decile 6

Decile 7

Decile 8

Decile 9

Decile 10

Mainline Ads Sidebar Ads

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 116: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Definition Mining [Hao et al 2016]

bull Definition scenarios search engines QnA etc

bull Why Conceptualization is useful for definition miningbull Examples ldquoWhat is Emphysemardquo

Emphysema is a disease largely associated with smoking and strikes about 2 million Americans each year

Emphysema is an incurable progressive lung disease that primarily affects smokers and causes shortness of breath and difficulty breathing

bull This sentence has the form of definitionbull Embedding is helpful to some extent but it also return high similarity

score for (emphysema disease) and (emphysema smoking)

bull Conceptualization can provide strong semanticsbull Contextual embedding can also provide semantic similarity beyond Is-A

Answer 1

Answer 2

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 117: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Definition Mining [Hao et al 2016]

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 118: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Concept based Short Text Classification and Ranking [Wang et al 2014a]

OfflineOffline OnlineOnline

Original Short textjustin bieber graduates

hellip

Knowledge base

Conceptualiztion

Concept Vector

Entity Extraction

Candidates Generation

Classification amp Ranking

Model LearningModel Learning

Concept Weighting

Model Model NModel i

Concept Model Concept Model

Class 1 Class NClass i

TrainingData

ltMusic Scoregt

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 119: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept SpaceArticle titlestagsin this category

119901119894

119901119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 120: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Music

Movie

hellip

hellip

120596119894

120596119895

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 121: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Concept based Short Text Classification and Ranking [Wang et al 2014a]

TV

Category

Concept Space

Query

Music

Movie

hellip

hellip

120596119894

120596119895119901119894

119901119895

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 122: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Precision performance on each category [Wang et al 2014a]

BocSTC LM_ch SVMVSM_cosi

neLM_d Entity_ESA

Movie 071 091 084 081 072 056

Money 097 095 054 057 052 074

Music 097 090 088 073 068 058

TV 096 046 092 056 051 055

0304050607080910

Pre

cisi

on

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 123: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Examples [Wang et al 2014a]

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 124: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Table Understanding [Wang et al 2012a]

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 125: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

Semantic Search [Wang et al 2012b]

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 126: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

References

bull [ Stark et al 1998 ] Michael M Stark and Richard F Riesenfeld WordNet An Electronic Lexical Database Proceedings of 11th Eurographics Workshop on Rendering 1998

bull [ Banko et al 2007 ] Michele Banko Michael J Cafarella Stephen Soderland Matt Broadhead and Oren Etzioni Open Information Extraction from the Web in IJCAI 2007

bull [ Etzioni et al 2011 ] Etzioni Oren Anthony Fader Janara Christensen Stephen Soderland and Mausam Mausam Open Information Extraction The Second Generation In IJCAI vol 11 pp 3-10 2011

bull [Carlson et al 2010 ] A Carlson J Betteridge B Kisiel B Settles ER Hruschka Jr and TM Mitchell Toward an Architecture for Never-Ending Language Learning In Proceedings of the Conference on Artificial Intelligence (AAAI) 2010

bull [ Wu et al 2012 ] Wentao Wu Hongsong Li Haixun Wang and Kenny Zhu Probase A Probabilistic Taxonomy for Text Understanding in ACM International Conference on Management of Data (SIGMOD) May 2012

bull [ Bollacker et al 2008 ] Kurt Bollacker Colin Evans Praveen Paritosh Tim Sturge Jamine Taylor Freebase a collaboratively created graph database for structuring human knowledgeltigt in SIGMOD 2008

bull [ Auer et al 2007 ] Soumlren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G Ives DBpedia A Nucleus for a Web of Open Data In ISWCASWC 2007

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 127: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

References

bull [ Suchanek et al 2007 ] Fabian M Suchanek Gjergji Kasneci Gerhard Weikum Yago a core of semantic knowledge in WWW 2007

bull [ Wu et al 2015 ] Sen Wu Ce Zhang Christopher De Sa Jaeho Shin Feiran Wang and C Reacute Incremental Knowledge Base Construction Using DeepDive in VLDB 2015

bull [ Navigli et al 2012 ] R Navigli and S Ponzetto BabelNet The Automatic Construction Evaluation and Application of a Wide-Coverage Multilingual Semantic Network in Artificial Intelligence 2012

bull [ Nastase et al 2010 ] Vivi Nastase Michael Strube Benjamin Boumlrschinger Caumlcilia Zirn and AnasElghafari WikiNet A very large scale multi-lingual concept network in LREC 2010

bull [ Speer et al 2013 ] Robert Speer and Havasi Catherine ConceptNet 5 A large semantic network for relational knowledge The Peoplersquos Web Meets NLP Springer Berlin Heidelberg 2013

bull [ Hua et al 2016 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou ldquoUnderstand Short Texts by Harvesting and Analyzing Semantic Knowledgerdquo IEEE Transactions on Knowledge and Data Engineering (TKDE) 2016

bull [ Hua et al 2015 ] Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou Short Text Understanding Through Lexical-Semantic Analysis in International Conference on Data Engineering (ICDE) April 2015

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 128: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

References

bull [ Li et al 2013 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang and Xindong Wu Computing term similarity by large probabilistic isa knowledge In ACM International Conference on Information and Knowledge Management (CIKM) 2013

bull [ Li et al 2015 ] Peipei Li Haixun Wang Kenny Q Zhu Zhongyuan Wang Xue-Gang Hu and XindongWu A Large Probabilistic Semantic Network based Approach to Compute Term Similarity In IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(10) 2604-2617 2015

bull [ Rosch et al 1976 ] Eleanor Rosch Carolyn B Mervis Wayne D Gray David M Johnson and Penny BoyesBraem Basic objects in natural categories Cognitive psychology 8(3)382ndash439 1976

bull [ Manning and Schutze 1999 ] Christopher D Manning and Hinrich Schutze Foundations of statistical natural language processing In volume 999 MIT Press 1999

bull [ Wang et al 2015b ] Zhongyuan Wang Kejun Zhao Haixun Wang Xiaofeng Meng and Ji-Rong Wen Query Understanding through Knowledge-Based Conceptualization In IJCAI July 2015

bull [ Bergsma et al 2007 ]Shane Bergsma Qin Iris Wang Learning Noun Phrase Query Segmentation In EMNLP-CoNLL 2007 819-826

bull [ Tan et al 2008 ] Bin Tan Fuchun Peng Unsupervised query segmentation using generative language models and wikipedia In WWW 2008 347-356

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 129: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

References

bull [ Li et al 2011 ] Yanen Li Bo-June Paul Hsu ChengXiang Zhai Kuansan Wang Unsupervised query segmentation using clickthrough for information retrieval In SIGIR 2011 285-294

bull [ Guo et al 2009 ] Jiafeng Guo Gu Xu Xueqi Cheng Hang Li Named entity recognition in query In SIGIR 2009 267-274

bull [ Pantel et al 2012 ] Patrick Pantel Thomas Lin Michael Gamon Mining Entity Types from Query Logs via User Intent Modeling In ACL 2012 563-571

bull [ Joshi et al 2014 ] Mandar Joshi Uma Sawant Soumen Chakrabarti Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries In EMNLP 2014 1104-1114

bull [ Sawant et al 2013 ] Uma Sawant Soumen Chakrabarti Learning joint query interpretation and response ranking In WWW 2013 1099-1110

bull [ Wang et al 2014b ] Zhongyuan Wang Haixun Wang and Zhirui Hu Head Modifier and Constraint Detection in Short Texts in International Conference on Data Engineering (ICDE) 2014

bull [ Sun et al 2016 ] Xiangyan Sun Haixun Wang Yanghua Xiao Zhongyuan Wang Syntactic Parsing of Web Queries In EMNLP 2016

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012

Page 130: Understanding Short Texts - Part II: Explicit Representationwangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/Sli… · from the Web [Banko et al. 2007, Etzioni et al

References

bull [ Kenter and Rijke 2015 ] Tom Kenter and Maarten de Rijke Short text similarity with word embeddingsIn CIKM 2015

bull [ Wang et al 2015a ] Zhongyuan Wang Haixun Wang Ji-Rong Wen and Yanghua Xiao An Inference Approach to Basic Level of Categorization In CIKM October 2015

bull [ Hao et al 2016 ] Zehui Hao Zhongyuan Wang Xiaofeng Meng and Jun Yan Combining Language Model with Conceptualization for Definition Ranking MSR-Technical Report 2016

bull [ Wang et al 2014a ] Fang Wang Zhongyuan Wang Zhoujun Li and Ji-Rong Wen Concept-based Short Text Classification and Ranking In CIKM 2014

bull [ Wang et al 2012a ] Jingjing Wang Haixun Wang Zhongyuan Wang and Kenny Zhu Understanding Tables on the Web In International Conference on Conceptual Modeling October 2012

bull [ Wang et al 2012b ] Yue Wang Hongsong Li Haixun Wang and Kenny ZhuToward Topic Search on the Web In International Conference on Conceptual Modeling October 2012