a multilingual hierarchy mapping method based on ghsom hsin-chang yang associate professor...
DESCRIPTION
3 Introduction Most of the search engines provide only monolingual search interface. It would be convenient for the users to express their queries in familiar language and search documents in other languages. Cross-lingual or multilingual information retrieval How to do this?TRANSCRIPT
![Page 1: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/1.jpg)
A Multilingual Hierarchy Mapping Method Based on
GHSOM
Hsin-Chang YangAssociate Professor
Department of Information Management
National University of Kaohsiung
![Page 2: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/2.jpg)
2
Outline
IntroductionDocument Processing and Clustering by GHSOMAssociation DiscoveryExperimental ResultConclusions
![Page 3: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/3.jpg)
3
Introduction
Most of the search engines provide only monolingual search interface.It would be convenient for the users to express their queries in familiar language and search documents in other languages.
Cross-lingual or multilingual information retrieval
How to do this?
![Page 4: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/4.jpg)
4
Translate the queries or the documents into another language
Easy and convenientImprecise for modern machine translation systems
Match queries and documents directlyDirect match of semanticsDifficult to match semantics; need for schemes of semantic relatedness discovery between languages
Introduction
![Page 5: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/5.jpg)
5
Multilingual text miningDiscovering semantic relationships between linguistic entities of different languages
In this work, we will develop a MLTM scheme based on GHSOM.
Introduction
![Page 6: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/6.jpg)
System Architecture
6
Chinese documents
English documents
Parallel corpora
preprocessing
Chinese document vectors
English document vectors
Train by GHSOM
Hierarchy of monolingual documents
Hierarchy of bilingual
documents
Association discovery
Document associations
Keyword associations
Document/Keyword associations
query
Retrieval result MLTM process
MLIR process
![Page 7: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/7.jpg)
7
Document Processing and Clustering by GHSOM
GHSOM was proposed by Rauber et al. to provide the SOM with capabilities of dynamic map expansion and hierarchy construction.
has been applied to expertise management, failure detection, and multilingual information retrieval
We used GHSOM to organize multilingual documents into hierarchies.
![Page 8: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/8.jpg)
Document Processing and Clustering by GHSOM
A typical structure of GHSOM
8
Layer 0
Layer 1
Layer 2
Layer 3
![Page 9: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/9.jpg)
9
Document Processing and Clustering by GHSOM
Document preprocessingword segmentationstemmingstopword eliminationkeyword selection
Document encodingA document Dj is encoded into a vector Dj = {tf-idfij}, 1 i |V|, where V denotes the vocabulary.
![Page 10: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/10.jpg)
10
Document Processing and Clustering by GHSOM
Document clusteringDocument vectors were trained by GHSOM.Two hierarchies were constructed for English and Chinese documents respectively.
C1
C3
C5
C2
C4
E1 E2
E3
E4E5
Ck
Document labelling
Chinese hierarchy English hierarchy
Ep
Eq
![Page 11: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/11.jpg)
11
Association Discovery
The constructed hierarchies reveal document and keyword associations for individual languages.However, associations between documents or keywords of different languages are much difficult to find because there is no direct mapping between these hierarchies.
![Page 12: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/12.jpg)
12
Association Discovery
Finding Associationsto associate a Chinese keyword cluster with an English keyword clustera kind of general problem of ontology alignment
A Chinese keyword cluster is considered to be related to an English one if they represent the same theme.
the theme of a keyword cluster could be determined by the documents labelled to the same neuron as it
![Page 13: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/13.jpg)
13
Association Discovery
Thus we could associate two clusters according to their corresponding document clusters.parallel corpora were used
the correspondence between documents of different languages is known a priori
To associate a Chinese cluster Ck with some English cluster El, we use a voting scheme to calculate the likelihood of such association.
![Page 14: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/14.jpg)
14
Association Discovery
Voting for best-matched cluster1. For each pair of Chinese documents Ci and Cj
in Ck, we should find the neuron clusters which their English counterparts Ei and Ej are labelled to in the English hierarchy. Let these clusters be Ep and Eq.
2. Find the shortest path between Ep and Eq in the English hierarchy.
3. Add 1 to Ep and Eq. Add 1/(dist(Ci, Cj)-1) to all other clusters in the path.
4. Repeat 1-3 for all pairs of documents in Ck.
![Page 15: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/15.jpg)
15
Association Discovery
We associate Ck with El when it has the highest score.An example
0.83
2 1.33
2
2 0
0.83
English hierarchy
![Page 16: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/16.jpg)
16
Association Discovery
Document associationsChinese document Ci is associated with English document Ej if their corresponding clusters are associated.
Keyword associationsA Chinese keyword labelled to neuron k in the Chinese hierarchy will be associated with an English keyword labelled to neuron l in the English hierarchy if Ck and El are associated.
![Page 17: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/17.jpg)
17
Association Discovery
Document-keyword associationsWhen Ck is associated with El, all documents and keywords labelled to these two neurons are associated.
![Page 18: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/18.jpg)
18
Experimental Result
Sinorama parallel corpora were usedChinese article was faithfully translated into English
Our corpus contains 976 parallel documents.We have a Chinese vocabulary of size 3436 and English vocabulary of size 3711.Each document is transformed into a vector.We used the GHSOM program developed by Rauber’s team to train the bilingual vectors.
http://www.ifs.tuwien.ac.at/~andi/ghsom/
![Page 19: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/19.jpg)
Experimental Result
An example Sinorama document
19
![Page 20: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/20.jpg)
20
Experimental Result
![Page 21: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/21.jpg)
Experimental Result
21
![Page 22: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/22.jpg)
22
Experimental Result
Performance Evaluationmean inter-document path length between each pair of documents in Ck or Ek:
The quality of the bilingual hierarchies can then be measured by the average of all Pk, denoted by , over entire hierarchy.
,
2
,dist1
k
jiji
kk
CCP C
C
kP
![Page 23: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/23.jpg)
Experimental Result
We computed the average value of over 100 trainings. We obtained a value of 2.39.
23
kP
![Page 24: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/24.jpg)
24
Conclusions
We proposed a text mining method to extract associations between multilingual texts and keywords.GHSOM performs well in clustering and organizing documents.The discovered associations seems plausible for MLIR and other MLTM applications.
![Page 25: A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of](https://reader035.vdocuments.us/reader035/viewer/2022062908/5a4d1b927f8b9ab0599c1a66/html5/thumbnails/25.jpg)
Thanks for your attention.
25