incorporating term definitions for taxonomic relation ... · our inspirations richer interpretation...
TRANSCRIPT
![Page 1: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/1.jpg)
Incorporating Term Definitions for Taxonomic RelationIdentification
Yongpan Sheng1 Tianxing Wu2 Xin Wang3
1School of Computer Science and TechnologyUniversity of Electronic Science and Technology of China (UESTC),
2Nangyang Technological University, 3Tianjin University
The 9th Joint International Semantic Technology Conference (JIST),2019
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 1 / 41
![Page 2: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/2.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 2 / 41
![Page 3: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/3.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 3 / 41
![Page 4: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/4.jpg)
Motivation
People often organize the lexical knowledge in the form of term taxonomy,such as Cognitive Concept Graph, HowNet.
Figure: An example of Cognitive Concept Graph from Alibaba
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 4 / 41
![Page 5: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/5.jpg)
Motivation
apple
apple (computer) apple (fruit)
computer fruit
PatternValue
able
bring
SpecificBrand
word
sense
sememe
modifier
apple (tree)
reproduce
fruit
tree
apple (phone)
tool
PatternValue
able
bring
SpecificBrand
communicate
modifierpatient
CoEvent
scope
CoEvent
scope
patient
instrument
agent
PatientProdect
Figure 2: An example of word annotated with sememes in HowNet
Attribute, Time, Space, Attribute Value and Event.In addition, to depict the semantics of words moreprecisely, HowNet incorporates relations betweensememes, which are called “dynamic roles”, intothe sememe annotations of words.
Considering the polysemy, HowNet differenti-ates diverse senses of each word in the sememeannotations. And each sense is also expressed inboth Chinese and English. An example of sememeannotation for a word is illustrated in Figure 2. Wecan see from the figure that the word “apple” hasfour senses including “apple (computer)”, “apple(phone)”, “apple (fruit)” and “apple (tree)”, andeach sense is the root node of a “sememe tree”where any pair of father and son sememe node ismulti-relational.
Additionally, HowNet annotates POS tag foreach sense, and add sentiment category as well assome usage examples for certain senses.
The latest version of HowNet is published inJanuary 2019 and the statistics are shown in Table1.
Type Count
Sense 229,767Distinct Chinese word 127,266Distinct English word 104,025
Sememe 2,187
Table 1: Statistics of latest version of HowNet
3 HowNet and Sememe-relatedResearches
Since HowNet was published, it has attracted wideattention. People use HowNet and sememe in var-ious NLP tasks including word similarity compu-tation (Liu and Li, 2002), word sense disambigua-tion (Zhang et al., 2005), question classification(Sun et al., 2007) and sentiment analysis (Dangand Zhang, 2010; Fu et al., 2013). Among theseresearches, Liu and Li (2002) is one of the most in-fluential works, in which similarities of given twowords are computed by measuring the degree ofresemblance of their sememe trees.
Recent years also witness some works incorpo-rating sememes into neural network models. Niuet al. (2017) propose a novel word representa-tion learning model named SST that reforms Skip-gram (Mikolov et al., 2013) by adding contextualattention to senses of target word, which are rep-resented with combinations of corresponding se-memes’ embeddings. Experimental results showthat SST can not only improve the quality of wordembeddings but also learn satisfactory sense em-beddings to do word sense disambiguation.
Gu et al. (2018) incorporate sememes into thedecoding phase of language modeling where se-memes are predicted first, and then senses andwords are predicted in succession. The proposedmodel shows enhancement in the perplexity oflanguage modeling and performance of down-
Figure: An example of word annotated with sememes in HowNet
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 5 / 41
![Page 6: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/6.jpg)
Motivation
Predicting (Semantic Machines, isA, Startup Company) mainly contributesto knowledge graph completion.
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 6 / 41
![Page 7: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/7.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 7 / 41
![Page 8: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/8.jpg)
Problem Definition
Determine whether a specific pair of terms 1 holds the taxonomic relation(“isA” relation) or not.e.g.,
(“Einstein”, “scientist”)
(“Mel Gibson”, “actor”)
(“Paris”, “city”)
1“terms” refers to any words or phrases.Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 8 / 41
![Page 9: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/9.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 9 / 41
![Page 10: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/10.jpg)
Previous approaches
Linguistic approaches, mainly rely on lexical-syntactic patterns (e.g., Atypical pattern is “A such as B”)
Higher precision in several applications, e.g., Probase construction.
The main drawbacks of this type of method are:
Identified patterns are too specific to cover the wide range of complexlinguistic circumstance.
Fully unsupervised, e.g., they may require a set of seed instances toinitiate the extraction process. Hence, recall is sacrificed.
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 10 / 41
![Page 11: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/11.jpg)
Previous approaches
Distributional approaches, embed the two terms into context-awarevector representations, and then predict their taxonomic relation based onthese representations.The main drawbacks of this type of method are:
Domain specificityAn IT corpus hardly mentions “apple” as a fruit.
Poor generalization capability(i) Other taxonomic relations.E.g., distributional inclusion hypothesis2 - if (“Einstein”, “scientist”),the typical contexts of Einstein will occur also with scientist.(ii) Unseen terms, rare terms, and terms with biased sensedistribution.E.g., Unseen terms - word embeddings for specified taxonomicrelation3.
2Harris, Z.S., Distributional Structure, 1954.3Nguyen, Kim Anh and Koper, Maximilian and Walde, et al., Hierarchical
embeddings for hypernymy detection and directionality, EMNLP 2017.Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 11 / 41
![Page 12: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/12.jpg)
Previous approaches
Poor generalization capability(ii) Unseen terms, rare terms, and terms with biased sensedistribution.E.g., rare terms - (“coma”, “knowledge”), (“bacterium”,“microorganism”)terms with biased sense distribution -(“apple”, “fruit”),(“apple”, “IT company”)
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 12 / 41
![Page 13: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/13.jpg)
Previous approaches
Limitations of the previous methods.
Lower recall (Linguistic approach)
Domain specificity (Distributional approach)
Poor generalization capability (Distributional approach)
As a result, the performance of this task is far from satisfactory.
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 13 / 41
![Page 14: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/14.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 14 / 41
![Page 15: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/15.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 15 / 41
![Page 16: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/16.jpg)
Our Inspirations
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 16 / 41
![Page 17: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/17.jpg)
Our Inspirations
Richer interpretation (definition in sense-level, distributional context)Our model is expected to:
Accurate prediction of taxonomic relations of term pairs in sense-level.
Generalizing well to unseen terms, rare terms, and terms with biasedsense distribution.
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 17 / 41
![Page 18: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/18.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 18 / 41
![Page 19: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/19.jpg)
The Baseline System
Figure: Architecture of the baseline system
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 19 / 41
![Page 20: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/20.jpg)
Our Proposed Model
Sentence Encoding Layer.
Idea: Siamese Network4
Architecture: Bi-LSTM + Sentence-level Context Attention +CNN Input: sentence pair S1 and S2. In our settings, term can betreated as short sentence.Output: the neural representation pi of each sentence Si (i = 1, 2)
Sentence Output Layer.The overall representation for the sentence pair, i.e., concatenating p1 andp2.
4Neculoiu, P., Versteegh, M., et al., Learning text similarity with siamese recurrentnetworks, the 1st Workshop on Representation Learning for NLP 2016
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 20 / 41
![Page 21: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/21.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 21 / 41
![Page 22: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/22.jpg)
Our Proposed Model
The Baseline System
: Distributional representation of hyponym x : Distributional representation of hyponym x
: Distributional representation of hypernym y : Distributional representation of hypernym y
: Definitive sentence of x : Definitive sentence of x
: Definitive sentence of y : Definitive sentence of y
0/1
Figure: Architecture of our proposed modelSheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 22 / 41
![Page 23: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/23.jpg)
Our Proposed Model
The Sentence Input Strategy. Four strategies to define the inputrepresentations on the baseline system:
ptt from (xhypo, yhyper)This combination intends to model embeddings from a hyponym toits hypernym via a network with weights.
ptd from (xhypo, dy), pdt from (dx, yhyper)These combinations benefit to generate indicative features acrossdistributional context and definition for discriminating taxonomicrelations from other semantic relations.
pdd from (dx, dy)This combination provides an alternative evidence for interpreting theterms.
Heuristic Matching.
p = [ptt; pdd; ptd − pdt; ptd � pdt], (1)
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 23 / 41
![Page 24: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/24.jpg)
Our Proposed Model
Softmax Output.o = W1(p ◦ r) + b. (2)
Loss Function and Training.
p(yi|xi, θ) =eo
i
Σkeok, (3)
J(θ) =∑i
log p(yi|xi, θ), (4)
Noting that:
Given an input instance as (xhyper, dx, yhypo, dy, 1/0), the network withparameter θ outputs the vector o, which is a 2-dimensional vector with thesum of component probability to 1.To compute θ, we maximize the log likelihood J(θ) through stochasticgradient descent over shuffled mini-batches with the Adam update rule.
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 24 / 41
![Page 25: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/25.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 25 / 41
![Page 26: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/26.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 26 / 41
![Page 27: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/27.jpg)
Experiments and Analysis
Dataset
Table: Dataset used in the experiments
Dataset #Train #Test #ValidationSplits random splits lexical splits random splits lexical splits random splits lexical splits
BLESS a 12459 757 2376 675 404 103
Conceptual Graph b 58484 29475 19610 7808 4079 2095
WebIsA-Animal c 5614 3784 1942 1021 407 249
WebIsA-Plantc 5534 2933 1610 861 305 169
Random and Lexical Dataset Splits. To better address “lexicalmemorization” phenomenond.Roughly a ratio of 14:5:1 for training set, test set and validation setpartitioned randomly.Roughly a ratio of 8:1 for positive instances and negative instances inrandom or lexical splits in the datasets.
ahttps://sites.google.com/site/geometricalmodels/shared-evaluationbhttps://concept.research.microsoft.com/Home/Downloadchttp://webdatacommons.org/isadb/dLevy, O., Remus, S., Biemann, C., et al., Do Supervised Distributional
Methods Really Learn Lexical Inference Relations?, NAACL 2015.
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 27 / 41
![Page 28: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/28.jpg)
Experiments and Analysis
Dataset
Term Definition Collection - WordNet and English Wikipedia.(i) We first try to extract respective definition of hyponym andhypernym from WordNet based on the term in strings. For a fewpairs which contain terms not covered by WordNet. (ii) We thenswitch to Wikipedia in which the term can be involved, and select thetop-2 sentences in the first subgraph in the introductory sections, asits definitive description. (iii) If we are failed in two knowledgeresources, we set definitions the same as the term in strings.
Pre-trained Word Embeddings. We use two large-scale textualcorpus (i.e., English wikipedia and extracted abstracts of DBpedia) totrain word embeddings.
Noting that:
As WordNet sorts sense definitions by sense frequency, we only choose thetop-1 sense definition to denote a term.
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 28 / 41
![Page 29: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/29.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 29 / 41
![Page 30: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/30.jpg)
Experimental settings
Compared methods:
Word2Veca + SVM
DDMb + SVM
DWNNc + SVM
Best unsupervised (denpendency-based context)d
OursSubInput, the variant of our method
OursConcat, the variant of our method
aMikolov, T., Chen, K., et al., Efficient estimation of word representations invector space, ICLR (Workshop Poster) 2013
bYu, Z., Wang, H., et al., Learning Term Embeddings for HypernymyIdentification, IJCAI 2015
cAnh, T.L., Tay, Y., et al., Learning Term Embeddings for TaxonomicRelation Identification Using Dynamic Weighting Neural Network, EMNLP 2016
dShwartz, V., Santus, E., et al., Hypernyms under Siege:Linguistically-motivated Artillery for Hypernymy Detection, EACL 2017
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 30 / 41
![Page 31: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/31.jpg)
Experimental settings
Evaluation metrics: Mean Average F-score (for random splits), AverageF-score@200 (for lexical splits)Parameter Tuning: We employ grid search for a range ofhyper-parameters, and picked the combination of ones that yield thehighest F-score on the validation set.
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 31 / 41
![Page 32: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/32.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 32 / 41
![Page 33: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/33.jpg)
Evaluation and Results Analysis
Performance on specific domain datasets
Table: Performance comparison of different methods on domain-specific datasets(including WebIsA-Animal and WebIsA-Plant). We report Precision at rank 200(P@200), Recall at rank 200 (R@200), F-score at rank 200 (F@200) for RandomSplits, and Mean Average Precision (P), Mean Average Recall (R), Mean AverageF-score (F) for Lexical Splits. The best performance in the F-score column isboldfaced (the higher, the better).
Datasets WebIsA-Animal WebIsA-Plant
Random Splits(@200)
Lexical SplitsRandom Splits
(@200)Lexical Splits
Method P R F P R F P R F P R F
Previous methods
Word2Vec + SVM 0.796 0.706 0.748 0.785 0.674 0.725 0.817 0.730 0.771 0.708 0.623 0.663
DDM + SVM 0.706 0.620 0.660 0.655 0.521 0.580 0.759 0.695 0.726 0.712 0.517 0.599
DWNN + SVM 0.893 0.714 0.794 0.820 0.550 0.658 0.916 0.705 0.797 0.875 0.689 0.771
Best unsupervised 0.897 0.625 0.737 0.730 0.510 0.600 0.827 0.650 0.728 0.702 0.609 0.652
Our method and its variants
OursSubInput 0.693 0.747 0.719 0.617 0.404 0.488 0.677 0.722 0.699 0.618 0.689 0.652
OursConcat 0.877 0.707 0.783 0.734 0.637 0.682 0.895 0.699 0.785 0.752 0.645 0.694
Ours 0.914 0.755 0.827 0.892 0.697 0.783 0.920 0.799 0.855 0.881 0.692 0.775
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 33 / 41
![Page 34: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/34.jpg)
Evaluation and Results Analysis
Performance on open domain datasets
Table: Performance comparison of different methods on open domain datasets(including BLESS and Conceptual Graph). We report Precision at rank 200(P@200), Recall at rank 200 (R@200), F-score at rank 200 (F@200) for RandomSplits, and Mean Average Precision (P), Mean Average Recall (R), Mean AverageF-score (F) for Lexical Splits. The best performance in the F-score column isboldfaced (the higher, the better).
Datasets BLESS Conceptual Graph
Random Splits(@200)
Lexical SplitsRandom Splits
(@200)Lexical Splits
Method P R F P R F P R F P R F
Previous methods
Word2Vec + SVM 0.719 0.693 0.706 0.702 0.648 0.674 0.750 0.629 0.684 0.699 0.608 0.650
DDM + SVM 0.839 0.744 0.789 0.785 0.684 0.731 0.889 0.669 0.763 0.764 0.675 0.717
DWNN + SVM 0.914 0.677 0.778 0.792 0.545 0.646 0.930 0.697 0.797 0.885 0.640 0.743
Best unsupervised 0.654 0.590 0.620 0.675 0.541 0.601 0.731 0.557 0.632 0.702 0.607 0.651
Our method and its variants
OursSubInput 0.675 0.659 0.670 0.621 0.600 0.610 0.683 0.623 0.652 0.608 0.572 0.589
OursConcat 0.864 0.719 0.785 0.803 0.677 0.735 0.874 0.760 0.813 0.760 0.679 0.717
Ours 0.871 0.723 0.790 0.811 0.694 0.748 0.899 0.775 0.832 0.854 0.728 0.786
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 34 / 41
![Page 35: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/35.jpg)
Evaluation and Results Analysis
For domain-specific datasets (i.e., WebIsA-Animal andWebIsA-Plant), on a random split, ours method achieves significantlyimprovements on the average of F-score by 8.1% and 14.8%compared to the Word2Vec and DDM methods.
For open domain datasets (i.e., BLESS and Conceptual Graph), on arandom split, our method improves the average F-score by 11.6%compared to Word2Vec, by 3.5% compared to DDM, and by 2.3%compared to DWNN methods.
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 35 / 41
![Page 36: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/36.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 36 / 41
![Page 37: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/37.jpg)
Evaluation and Results Analysis
Error Analysis
Inaccurate definition(i) Terms with biased sense.- Exploring more advanced entity liking techniques, or extract moreaccurately one from all highly related definitions by combining currentcontext, along with the efficient ranking algorithm.(ii) Prominent context words. - depending on human-craftedknowledge.(iii) Rare term and entities pairs, only encoding their term meaningsas the definitions in our model.Other relations(i) Confusing meronymy and taxonomic relations, e.g., the term pair(“paws”, “cat”)- Adding more negative instances of this kind to the datasets.(ii) Reversed error (negative instances in WebIsA dataset)- Integrating the learning of term embeddings with the distancemeasure as the feature (e.g., 1-norm distance) into the model.
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 37 / 41
![Page 38: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/38.jpg)
Outline
1 Motivation
2 Problem Definition
3 Previous approaches
4 The Proposed ApproachOur InspirationsThe Baseline SystemOur Proposed Model
5 Experiments and AnalysisDatasetExperimental settingsPerformance on specific and open domain datasetsError Analysis
6 Conclusions and Future Work
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 38 / 41
![Page 39: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/39.jpg)
Conclusions and Future Work
Conclusions
We presented a neural network model, which can enhance therepresentations of term pairs by incorporating their separativeaccurately textual definitions, for identifying the taxonomic relation ofpairs.
In our experiments, we showed that our model outperforms severalcompetitive baseline methods and achieves more than 82% F-scoreon two domain-specific datasets. Moreover, our model, once trained,performs competitively in various open domain datasets. Thisdemonstrates the good generalization capacity of our model.
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 39 / 41
![Page 40: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/40.jpg)
Conclusions and Future Work
Future Work
One is to consider how to integrate multiple types of knowledge(e.g., word meanings, definitions, knowledge graph paths, andimages) to enhance the representations of term pairs and furtherimprove the performance of this work.
The other is to investigate whether this model would be used to thetask of multiple semantic relations classification.
Codes and Datasets
We will release the codes and datasets at:https://github.com/shengyp/Taxonomic-relation/
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 40 / 41
![Page 41: Incorporating Term Definitions for Taxonomic Relation ... · Our Inspirations Richer interpretation (de nition in sense-level, distributional context) Our model is expected to: Accurate](https://reader035.vdocuments.us/reader035/viewer/2022070722/5f01b3957e708231d400a094/html5/thumbnails/41.jpg)
Thanks for your time! Any question?
Sheng, Xu, Wang et al. Incorporating Term Definitions for TRI JIST 2019 41 / 41