probabilistic semantic similarity measurements for noisy short texts using wikipedia entities masumi...
TRANSCRIPT
![Page 1: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/1.jpg)
1
Probabilistic Semantic Similarity Measurements
for Noisy Short Texts Using Wikipedia Entities
Masumi Shirakawa1, Kotaro Nakayama2, Takahiro Hara1, Shojiro Nishio1
1Osaka University, Osaka, Japan2University of Tokyo, Tokyo, Japan
![Page 2: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/2.jpg)
Challenge in short text analysis
Statistics are not always enough.
2
A year and a half after Google pulled its popular search engine out of mainland China
Baidu and Microsoft did not disclose terms of the agreement
![Page 3: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/3.jpg)
Challenge in short text analysis
Statistics are not always enough.
3
A year and a half after Google pulled its popular search engine out of mainland China
Baidu and Microsoft did not disclose terms of the agreement
Search enginesand China
They are talking about...
![Page 4: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/4.jpg)
Challenge in short text analysis
Statistics are not always enough.
4
A year and a half after Google pulled its popular search engine out of mainland China
Baidu and Microsoft did not disclose terms of the agreement
How do machines know that the two sentences mention about the similar
topic?
Search enginesand China
They are talking about...
![Page 5: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/5.jpg)
Reasonable solution
Use external knowledge.
5
A year and a half after Google pulled its popular search engine out of mainland China
Baidu and Microsoft did not disclose terms of the agreement
Wikipedia Thesaurus [Nakayama06]
![Page 6: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/6.jpg)
Related work
ESA: Explicit Semantic Analysis [Gabrilovich07]Add Wikipedia articles (entities) to a text as its semantic representation.
1. Get search ranking of Wikipedia for each term (i.e. Wiki articles and scores).2. Simply sum up the scores for aggregation.
6
Apple Inc.
Apple Inc.
pricing
pearApple sells
a new product
Apple
product
Key termextraction
Related entityfinding
Aggregation
iPhone
iPad
iPhonepear
Input: Tt c Output:
ranked list of csells
new
business
![Page 7: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/7.jpg)
Problems in real world noisy short texts“Noisy” means semantically noisy in this work. (We do not handle informal or casual surface forms, or misspells)
Term ambiguity• Apple (fruit) should not be related with Microsoft.
Fluctuation of term dominance• A term is not always important in texts.
7
We explore more effective aggregation method.
![Page 8: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/8.jpg)
We propose Extended naïve Bayes to aggregate related entities
Apple Inc.
Apple Inc.
Probabilistic method
8
𝑃 (𝑡∈𝑇 ) 𝑃 (𝑒|𝑡 ) 𝑃 (𝑐|𝑒)∏𝑘
𝑃 (𝑐|𝑡𝑘 )
𝑃 (𝑐 )𝐾−1
𝑃 (𝑐∨𝑇 )=∏𝑘=1
𝐾
(𝑃 (𝑡𝑘∈𝑇 )𝑃 (𝑐|𝑡𝑘)+(1−𝑃 (𝑡𝑘∈𝑇 ) )𝑃 (𝑐))𝑃 (𝑐 )𝐾− 1
From text T to related articles c
𝑃 (𝑐|𝑡 )
[Milne08]
[Mihalcea07] [Song1
1]
pricing
pearApple sells
a new product
Apple
product
Key termextraction
Related entityfinding
Aggregation
iPhone
iPad
iPhoneiPad
Input: T t c Output: ranked list of c
[Nakayama06]
![Page 9: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/9.jpg)
When input is multiple terms
Apply naïve Bayes [Song11] to multiple terms to obtain related entity c using each probability P (c |tk ).
𝑃 (𝑐|𝑡1 ,…, 𝑡𝐾 )=𝑃 (𝑡 1 ,…, 𝑡𝐾|𝑐 )𝑃 (𝑐 )
𝑃 (𝑡1 ,…, 𝑡𝐾 )=𝑃 (𝑐 )∏
𝑘
𝑃 ( 𝑡𝑘|𝑐 )
𝑃 (𝑡 1 ,… ,𝑡𝐾 )=∏𝑘
𝑃 (𝑐|𝑡𝑘 )
𝑃 (𝑐 )𝐾−1
𝑡1
𝑡 2
𝑡𝐾
…
Apple
product
new
iPhone
Compute for each related entity c
𝑐
𝑃 (𝑐∨𝑡1)
𝑃 (𝑐∨𝑡2)
𝑃 (𝑐∨𝑡𝐾)
“iPhone”
9
![Page 10: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/10.jpg)
When input is multiple terms
Apply naïve Bayes [Song11] to multiple terms to obtain related entity c using each probability P (c |tk ).
𝑃 (𝑐|𝑡1 ,…, 𝑡𝐾 )=𝑃 (𝑡 1 ,…, 𝑡𝐾|𝑐 )𝑃 (𝑐 )
𝑃 (𝑡1 ,…, 𝑡𝐾 )=𝑃 (𝑐 )∏
𝑘
𝑃 ( 𝑡𝑘|𝑐 )
𝑃 (𝑡 1 ,… ,𝑡𝐾 )=∏𝑘
𝑃 (𝑐|𝑡𝑘 )
𝑃 (𝑐 )𝐾−1
𝑡1
𝑡 2
𝑡𝐾
…
Apple
product
new
iPhone
Compute for each related entity c
𝑐
𝑃 (𝑐∨𝑡1)
𝑃 (𝑐∨𝑡2)
𝑃 (𝑐∨𝑡𝐾)
“iPhone”
10
By using naïve Bayes, entities that are related
to multiple terms can be boosted.
![Page 11: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/11.jpg)
When input is text
Not “multiple terms” but “text,” i.e., we don’t know which terms are key terms.
We developed extended naïve Bayes to solve this problem.Cannot observewhich are key
terms
𝑡1
𝑡 2
𝑡𝐾
…
Apple
product
new
iPhone 𝑐
𝑃 (𝑐∨𝑡1)
𝑃 (𝑐∨𝑡2)
𝑃 (𝑐∨𝑡𝐾)
“iPhone”
11
![Page 12: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/12.jpg)
Extended naïve Bayes𝑇 ′={𝑡1 }
…
𝑇 ′={𝑡1 , 𝑡2 }
𝑇 ′={𝑡1 ,⋯ ,𝑡𝐾 }
𝑇
…
Appleproduc
t
new
… Apple
Appleproduc
t
Appleproduc
t
new
Candidates of key
term
Apply naïve Bayesto each state T’
Probability that the set of key terms T is a
state T’ : …
12
![Page 13: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/13.jpg)
Extended naïve Bayes𝑇 ′={𝑡1 }
…
𝑇 ′={𝑡1 , 𝑡2 }
𝑇 ′={𝑡1 ,⋯ ,𝑡𝐾 }
𝑇
…
Appleproduc
t
new
… Apple
Appleproduc
t
Appleproduc
t
new
Candidates of key
term
Apply naïve Bayesto each state T’
Probability that the set of key terms T is a
state T’ : ∑𝑇 ′
𝑃 (𝑐∨𝑇 ′)𝑃 (𝑇=𝑇 ′)=∏𝑘
(𝑃 ( 𝑡𝑘∈𝑇 )𝑃 (𝑐|𝑡𝑘 )+(1−𝑃 ( 𝑡𝑘∈𝑇 ))𝑃 (𝑐 ))𝑃 (𝑐 )𝐾−1
…
13
![Page 14: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/14.jpg)
Extended naïve Bayes𝑇 ′={𝑡1 }
…
𝑇 ′={𝑡1 , 𝑡2 }
𝑇 ′={𝑡1 ,⋯ ,𝑡𝐾 }
𝑇
…
Appleproduc
t
new
… Apple
Appleproduc
t
Appleproduc
t
new
Candidates of key
term
Apply naïve Bayesto each state T’
Probability that the set of key terms T is a
state T’ : ∑𝑇 ′
𝑃 (𝑐∨𝑇 ′)𝑃 (𝑇=𝑇 ′)=∏𝑘
(𝑃 ( 𝑡𝑘∈𝑇 )𝑃 (𝑐|𝑡𝑘 )+(1−𝑃 ( 𝑡𝑘∈𝑇 ))𝑃 (𝑐 ))𝑃 (𝑐 )𝐾−1
…
14
Term dominance is incorporated into naïve Bayes
![Page 15: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/15.jpg)
Experiments on short text sim datasets[Datasets] Four datasets derived from word similarity datasets using dictionary
[Comparative methods] Original ESA [Gabrilovich07], ESA with 16 parameter settings
[Metrics] Spearman’s rank correlation coefficient
15
ESA with well-adjusted parameter is superior to our method for “clean”
texts.
![Page 16: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/16.jpg)
Tweet clustering
K-means clustering using the vector of related entities for measuring distance
[Dataset] 12,385 tweets including 13 topics
[Comparative methods] Bag-of-words (BOW), ESA with the same parameter, ESA with well-adjusted parameter
[Metric] Average of Normalized Mutual Information (NMI), 20 runs
16
#MacBook (1,251) #Silverlight (221)#VMWare (890)#MySQL (1,241) #Ubuntu (988) #Chrome (1,018)#NFL (1,044) #NHL (1,045)#NBA (1,085)#MLB (752) #MLS (981) #UFC (991)#NASCAR (878)
![Page 17: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/17.jpg)
Results
10 20 50 100 200 500 1000 20000
0.1
0.2
0.3
0.4
0.5
0.6
0.429
0.421
0.524
0.567
BOW ESA-sameESA-adjusted Our method
Number of related entities
NM
I sc
ore p-value <
0.01
17
![Page 18: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/18.jpg)
Results
10 20 50 100 200 500 1000 20000
0.1
0.2
0.3
0.4
0.5
0.6
0.429
0.421
0.524
0.567
BOW ESA-sameESA-adjusted Our method
Number of related entities
NM
I sc
ore p-value <
0.01
18
Our method outperformed ESA withwell-adjusted parameter for noisy short
texts.
![Page 19: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649ce35503460f949ae5de/html5/thumbnails/19.jpg)
Conclusion
We proposed extended naïve Bayes to derive related Wikipediaentities given a real world noisy short text.
[Future work]Tackle multilingual short textsDevelop applications of the method
19