neighborhood matching network for entity alignment · neighborhood matching network for entity...

11
Neighborhood Matching Network for Entity Alignment Yuting Wu 1 , Xiao Liu 1 , Yansong Feng 1,2* , Zheng Wang 3 and Dongyan Zhao 1,2 1 Wangxuan Institute of Computer Technology, Peking University, China 2 The MOE Key Laboratory of Computational Linguistics, Peking University, China 3 School of Computing, University of Leeds, U.K. {wyting,lxlisa,fengyansong,zhaodongyan}@pku.edu.cn [email protected] Abstract Structural heterogeneity between knowledge graphs is an outstanding challenge for entity alignment. This paper presents Neighborhood Matching Network (NMN), a novel entity alignment framework for tackling the struc- tural heterogeneity challenge. NMN estimates the similarities between entities to capture both the topological structure and the neigh- borhood difference. It provides two innova- tive components for better learning representa- tions for entity alignment. It first uses a novel graph sampling method to distill a discrimi- native neighborhood for each entity. It then adopts a cross-graph neighborhood matching module to jointly encode the neighborhood dif- ference for a given entity pair. Such strategies allow NMN to effectively construct matching- oriented entity representations while ignoring noisy neighbors that have a negative impact on the alignment task. Extensive experiments performed on three entity alignment datasets show that NMN can well estimate the neigh- borhood similarity in more tough cases and significantly outperforms 12 previous state-of- the-art methods. 1 Introduction By aligning entities from different knowledge graphs (KGs) to the same real-world identity, entity alignment is a powerful technique for knowledge integration. Unfortunately, entity alignment is non- trivial because real-life KGs are often incomplete and different KGs typically have heterogeneous schemas. Consequently, equivalent entities from two KGs could have distinct surface forms or dis- similar neighborhood structures. In recent years, embedding-based methods have become the dominated approach for entity align- ment (Zhu et al., 2017; Pei et al., 2019a; Cao et al., 2019; Xu et al., 2019; Li et al., 2019a; Sun et al., * Corresponding author. 布鲁克林区 纽约 美国 子行政区 子行政区 纽约州 子行政区 最大城市 最大城市 Brooklyn Queens southwest New York City subdivision subdivision New York state Manhattan south Dan Houser residence northwest . . . birthPlace David Julius KG1: 布鲁克林区 # neighbors: 3 # common neighbors: 2 KG2: Brooklyn # neighbors: 21 # common neighbors: 2 Case 1: equivalent entities with different sizes of neighborhoods (a) 利物浦 英格兰 英国 子行政区 国家 利物浦大学 城市 Liverpool Labour Party (UK) executive England constituent country United Kingdom sovereign state . . . birthPlace George Harrison KG1: 利物浦 # neighbors: 20 # common neighbors: 3 KG2: Liverpool # neighbors: 22 # common neighbors: 3 Case 2: most common neighbors are not discriminative for alignment (b) 工党(英国) 行政 . . . Figure 1: Illustrative examples: two tough cases for en- tity alignment. Dashed rectangles denote the common neighbors between different KGs. 2020). Such approaches have the advantage of not relying on manually constructed features or rules (Mahdisoltani et al., 2015). Using a set of seed alignments, an embedding-based method mod- els the KG structures to automatically learn how to map the equivalent entities among different KGs into a unified vector space where entity alignment can be performed by measuring the distance be- tween the embeddings of two entities. The vast majority of prior works in this direc- tion build upon an important assumption - entities and their counterparts from other KGs have sim- ilar neighborhood structures, and therefore, sim- ilar embeddings will be generated for equivalent entities. Unfortunately, the assumption does not always hold for real-life scenarios due to the in- completeness and heterogeneities of KGs. As an example, consider Figure 1 (a), which shows two equivalent entities from the Chinese and English versions of Wikipedia. Here, both central entities refer to the same real-world identity, Brooklyn,a borough of New York City. However, the two en- tities have different sizes of neighborhoods and arXiv:2005.05607v1 [cs.CL] 12 May 2020

Upload: others

Post on 17-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neighborhood Matching Network for Entity Alignment · Neighborhood Matching Network for Entity Alignment Yuting Wu 1, Xiao Liu , Yansong Feng;2, Zheng Wang3 and Dongyan Zhao1;2 1Wangxuan

Neighborhood Matching Network for Entity Alignment

Yuting Wu1, Xiao Liu1, Yansong Feng1,2∗, Zheng Wang3 and Dongyan Zhao1,2

1Wangxuan Institute of Computer Technology, Peking University, China2The MOE Key Laboratory of Computational Linguistics, Peking University, China

3School of Computing, University of Leeds, U.K.{wyting,lxlisa,fengyansong,zhaodongyan}@pku.edu.cn

[email protected]

Abstract

Structural heterogeneity between knowledgegraphs is an outstanding challenge for entityalignment. This paper presents NeighborhoodMatching Network (NMN), a novel entityalignment framework for tackling the struc-tural heterogeneity challenge. NMN estimatesthe similarities between entities to captureboth the topological structure and the neigh-borhood difference. It provides two innova-tive components for better learning representa-tions for entity alignment. It first uses a novelgraph sampling method to distill a discrimi-native neighborhood for each entity. It thenadopts a cross-graph neighborhood matchingmodule to jointly encode the neighborhood dif-ference for a given entity pair. Such strategiesallow NMN to effectively construct matching-oriented entity representations while ignoringnoisy neighbors that have a negative impacton the alignment task. Extensive experimentsperformed on three entity alignment datasetsshow that NMN can well estimate the neigh-borhood similarity in more tough cases andsignificantly outperforms 12 previous state-of-the-art methods.

1 Introduction

By aligning entities from different knowledgegraphs (KGs) to the same real-world identity, entityalignment is a powerful technique for knowledgeintegration. Unfortunately, entity alignment is non-trivial because real-life KGs are often incompleteand different KGs typically have heterogeneousschemas. Consequently, equivalent entities fromtwo KGs could have distinct surface forms or dis-similar neighborhood structures.

In recent years, embedding-based methods havebecome the dominated approach for entity align-ment (Zhu et al., 2017; Pei et al., 2019a; Cao et al.,2019; Xu et al., 2019; Li et al., 2019a; Sun et al.,

∗Corresponding author.

布鲁克林区

纽约

美国

子行政区

子行政区

纽约州

子行政区

最大城市

最大城市 Brooklyn

Queens

southwestNew York Citysubdivision

subdivision

New York state

Manhattansouth

Dan Houser

residence

northwest

. . .birthPlace

David Julius

KG1: 布鲁克林区 # neighbors: 3# common neighbors: 2

KG2: Brooklyn # neighbors: 21# common neighbors: 2

Case 1: equivalent entities with different sizes of neighborhoods

(a)

利物浦

英格兰

英国

子行政区

国家

利物浦大学

城市

Liverpool

Labour Party (UK)

executive

England

constituent country

United Kingdom

sovereign state

. . .birthPlace

George Harrison

KG1: 利物浦 # neighbors: 20# common neighbors: 3

KG2: Liverpool # neighbors: 22# common neighbors: 3

Case 2: most common neighbors are not discriminative for alignment

(b)

工党(英国)

行政

. . .

Figure 1: Illustrative examples: two tough cases for en-tity alignment. Dashed rectangles denote the commonneighbors between different KGs.

2020). Such approaches have the advantage ofnot relying on manually constructed features orrules (Mahdisoltani et al., 2015). Using a set ofseed alignments, an embedding-based method mod-els the KG structures to automatically learn how tomap the equivalent entities among different KGsinto a unified vector space where entity alignmentcan be performed by measuring the distance be-tween the embeddings of two entities.

The vast majority of prior works in this direc-tion build upon an important assumption - entitiesand their counterparts from other KGs have sim-ilar neighborhood structures, and therefore, sim-ilar embeddings will be generated for equivalententities. Unfortunately, the assumption does notalways hold for real-life scenarios due to the in-completeness and heterogeneities of KGs. As anexample, consider Figure 1 (a), which shows twoequivalent entities from the Chinese and Englishversions of Wikipedia. Here, both central entitiesrefer to the same real-world identity, Brooklyn, aborough of New York City. However, the two en-tities have different sizes of neighborhoods and

arX

iv:2

005.

0560

7v1

[cs

.CL

] 1

2 M

ay 2

020

Page 2: Neighborhood Matching Network for Entity Alignment · Neighborhood Matching Network for Entity Alignment Yuting Wu 1, Xiao Liu , Yansong Feng;2, Zheng Wang3 and Dongyan Zhao1;2 1Wangxuan

distinct topological structures. The problem ofdissimilar neighborhoods between equivalent en-tities is ubiquitous. Sun et al. (2020) reports thatthe majority of equivalent entity pairs have differ-ent neighbors in the benchmark datasets DBP15K,and the proportions of such entity pairs are over86% (up to 90%) in different language versions ofDBP15K. Particularly, we find that the alignmentaccuracy of existing embedding-based methods de-creases significantly as the gap of equivalent enti-ties’ neighborhood sizes increases. For instance,RDGCN (Wu et al., 2019a), a state-of-the-art, de-livers an accuracy of 59% on the Hits@1 score onentity pairs whose number of neighbors differs byno more than 10 on DBP15KZH−EN . However,its performance drops to 42% when the differencefor the number of neighbors increases to 20 andto 35% when the difference increases to be above30. The disparity of the neighborhood size andtopological structures pose a significant challengefor entity alignment methods.

Even if we were able to set aside the differencein the neighborhood size, we still have another is-sue. Since most of the common neighbors wouldbe popular entities, they will be neighbors of manyother entities. As a result, it is still challenging toalign such entities. To elaborate on this point, let usnow consider Figure 1 (b). Here, the two central en-tities (both indicate the city Liverpool) have similarsizes of neighborhoods and three common neigh-bors. However, the three common neighbors (indi-cate United Kingdom, England and Labour Party(UK), respectively) are not discriminative enough.This is because there are many city entities forEngland which also have the three entities in theirneighborhoods – e.g., the entity Birmingham. Forsuch entity pairs, in addition to common neighbors,other informative neighbors – like those closelycontextually related to the central entities – mustbe considered. Because existing embedding-basedmethods are unable to choose the right neighbors,we need a better approach.

We present Neighborhood Matching Network(NMN), a novel sampling-based entity alignmentframework. NMN aims to capture the most in-formative neighbors and accurately estimate thesimilarities of neighborhoods between entities indifferent KGs. NMN achieves these by leverag-ing the recent development in Graph Neural Net-works (GNNs). It first utilizes the Graph Convolu-tional Networks (GCNs) (Kipf and Welling, 2017)

to model the topological connection information,and then selectively samples each entity’s neigh-borhood, aiming at retaining the most informativeneighbors towards entity alignment. One of the keychallenges here is how to accurately estimate thesimilarity of any two entities’ sampled neighbor-hood. NMN addresses this challenge by design-ing a discriminative neighbor matching module tojointly compute the neighbor differences betweenthe sampled subgraph pairs through a cross-graphattention mechanism. Note that we mainly focuson the neighbor relevance in the neighborhood sam-pling and matching modules, while the neighborconnections are modeled by GCNs. We show that,by integrating the neighbor connection informationand the neighbor relevance information, NMN caneffectively align entities from real-world KGs withneighborhood heterogeneity.

We evaluate NMN by applying it to benchmarkdatasets DBP15K (Sun et al., 2017) and DWY100K(Sun et al., 2018), and a sparse variant of DBP15K.Experimental results show that NMN achieves thebest and more robust performance over state-of-the-arts. This paper makes the following technicalcontributions. It is the first to:

• employ a new graph sampling strategy foridentifying the most informative neighborstowards entity alignment (Sec. 3.3).

• exploit a cross-graph attention-based match-ing mechanism to jointly compare discrimina-tive subgraphs of two entities for robust entityalignment (Sec. 3.4).

2 Related Work

Embedding-based entity alignment. In recentyears, embedding-based methods have emerged asviable means for entity alignment. Early works inthe area utilize TransE (Bordes et al., 2013) to em-bed KG structures, including MTransE (Chen et al.,2017), JAPE (Sun et al., 2017), IPTransE (Zhuet al., 2017), BootEA (Sun et al., 2018), NAEA(Zhu et al., 2019) and OTEA (Pei et al., 2019b).Some more recent studies use GNNs to model thestructures of KGs, including GCN-Align (Wanget al., 2018), GMNN (Xu et al., 2019), RDGCN(Wu et al., 2019a), AVR-GCN (Ye et al., 2019),and HGCN-JE (Wu et al., 2019b). Besides thestructural information, some recent methods likeKDCoE (Chen et al., 2018), AttrE (Trisedya et al.,2019), MultiKE (Zhang et al., 2019) and HMAN

Page 3: Neighborhood Matching Network for Entity Alignment · Neighborhood Matching Network for Entity Alignment Yuting Wu 1, Xiao Liu , Yansong Feng;2, Zheng Wang3 and Dongyan Zhao1;2 1Wangxuan

(Yang et al., 2019) also utilize additional infor-mation like Wikipedia entity descriptions and at-tributes to improve entity representations.

However, all the aforementioned methods ignorethe neighborhood heterogeneity of KGs. MuGNN(Cao et al., 2019) and AliNet (Sun et al., 2020) aretwo most recent efforts for addressing this issue.While promising, both models still have drawbacks.MuGNN requires both pre-aligned entities and re-lations as training data, which can have expensiveoverhead for training data labeling. AliNet consid-ers all one-hop neighbors of an entity to be equallyimportant when aggregating information. However,not all one-hop neighbors contribute positively tocharacterizing the target entity. Thus, consideringall of them without careful selection can introducenoise and degrade the performance. NMN avoidsthese pitfalls. With only a small set of pre-alignedentities as training data, NMN chooses the mostinformative neighbors for entity alignment.

Graph neural networks. GNNs have recentlybeen employed for various NLP tasks like semanticrole labeling (Marcheggiani and Titov, 2017) andmachine translation (Bastings et al., 2017). GNNslearn node representations by recursively aggregat-ing the representations of neighboring nodes. Thereare a range of GNN variants, including the GraphConvolutional Network (GCN) (Kipf and Welling,2017), the Relational Graph Convolutional Net-work (Schlichtkrull et al., 2018), the Graph Atten-tion Network (Velickovic et al., 2018). Giving thepowerful capability for modeling graph structures,we also leverage GNNs to encode the structuralinformation of KGs (Sec. 3.2).

Graph matching. The similarity of two graphscan be measured by exact matching (graph iso-morphism) (Yan et al., 2004) or through structuralinformation like the graph editing distance (Ray-mond et al., 2002). Most recently, the Graph Match-ing Network (GMN) (Li et al., 2019b) computesa similarity score between two graphs by jointlyreasoning on the graph pair through cross-graphattention-based matching. Inspired by GMN, wedesign a cross-graph neighborhood matching mod-ule (Sec. 3.4) to capture the neighbor differencesbetween two entities’ neighborhoods.

Graph sampling. This technique samples a sub-set of vertices or edges from the original graph.Some of the popular sampling approaches includevertex-, edge- and traversal-based sampling (Hu

and Lau, 2013). In our entity alignment framework,we propose a vertex sampling method to selectinformative neighbors and to construct a neighbor-hood subgraph for each entity.

3 Our Approach

Formally, we represent a KG as G = (E,R, T ),where E,R, T denote the sets of entities, relationsand triples respectively. Without loss of generality,we consider the task of entity alignment betweentwo KGs, G1 and G2, based on a set of pre-alignedequivalent entities. The goal is to find pairs ofequivalent entities between G1 and G2.

3.1 Overview of NMN

As highlighted in Sec. 1, the neighborhood hetero-geneity and noisy common neighbors of real-worldKGs make it difficult to capture useful informationfor entity alignment. To tackle these challenges,NMN first leverages GCNs to model the neigh-borhood topology information. Next, it employsneighborhood sampling to select the more infor-mative neighbors. Then, it utilizes a cross-graphmatching module to capture neighbor differences.

As depicted in Figure 2, NMN takes as inputtwo KGs, G1 and G2, and produces embeddingsfor each candidate pair of entities, e1 and e2, so thatentity alignment can be performed by measuringthe distance, d(e1, e2), of the learned embeddings.It follows a four-stage processing pipeline: (1) KGstructure embedding, (2) neighborhood sampling,(3) neighborhood matching, and (4) neighborhoodaggregation for generating embeddings.

3.2 KG Structure Embedding

To learn the KG structure embeddings, NMN uti-lizes multi-layered GCNs to aggregate higher de-gree neighboring structural information for entities.

NMNs uses pre-trained word embeddings to ini-tialize the GCN. This strategy is shown to be ef-fective in encoding the semantic information ofentity names in prior work (Xu et al., 2019; Wuet al., 2019a). Formally, letG1 = (E1, R1, T1) andG2 = (E2, R2, T2) be two KGs to be aligned, weput G1 and G2 together as one big input graph toNMN. Each GCN layer takes a set of node featuresas input and updates the node representations as:

h(l)i = ReLU(

∑j∈Ni∪{i}

1

εiW(l)h

(l−1)j ) (1)

Page 4: Neighborhood Matching Network for Entity Alignment · Neighborhood Matching Network for Entity Alignment Yuting Wu 1, Xiao Liu , Yansong Feng;2, Zheng Wang3 and Dongyan Zhao1;2 1Wangxuan

G1

G2

GCNs

d(e1, e2)

KG Structure Embedding Neighborhood Sampling

e1e1e1

e2 e2 e2

e1

e2

Neighborhood Matching

Neighborhood Aggregation

Neighborhood Aggregation

Figure 2: Overall architecture and processing pipeline of Neighborhood Matching Network (NMN).

where {h(l)1 ,h

(l)2 , ...,h

(l)n |h(l)

i ∈ Rd(l)} is the out-put node (entity) features of l-th GCN layer, εi isthe normalization constant, Ni is the set of neigh-bor indices of entity i, and W(l) ∈ Rd(l)×d(l−1)

is alayer-specific trainable weight matrix.

To control the accumulated noise, we also intro-duce highway networks (Srivastava et al., 2015)to GCN layers, which can effectively control thenoise propagation across GCN layers (Rahimi et al.,2018; Wu et al., 2019b).

3.3 Neighborhood Sampling

The one-hop neighbors of an entity are key to de-termine whether the entity should be aligned withother entities. However, as we have discussed inSec. 1, not all one-hop neighbors contribute pos-itively for entity alignment. To choose the rightneighbors, we apply a down-sampling process toselect the most informative entities towards the cen-tral target entity from its one-hop neighbors.

Recall that we use pre-trained word embeddingsof entity names to initialize the input node fea-tures of GCNs. As a result, the entity embeddingslearned by GCNs contain rich contextual informa-tion for both the neighboring structures and theentity semantics. NMN exploits such informationto sample informative neighbors, i.e., neighborsthat are more contextually related to the central en-tity are more likely to be sampled. Our key insightis that the more often a neighbor and the central(or target) entity appear in the same context, themore representative and informative the neighboris towards the central entity. Since the contexts oftwo equivalent entities in real-world corpora areusually similar, the stronger a neighbor is contextu-ally related to the target entity, the more alignmentclues the neighbor is likely to offer. Experimentalresults in Sec. 5.3 confirm this observation.

Formally, given an entity ei, the probability to

sample its one-hop neighbor ei j is determined by:

p(hi j |hi) = softmax(hiWshTi j)

=exp(hiWsh

Ti j)∑

k∈Niexp(hiWshT

i k)

(2)

where Ni is the one-hop neighbor index of centralentity ei, hi and hi j are learned embeddings forentities ei and ei j respectively, and Ws is a sharedweight matrix.

By selectively sampling one-hop neighbors,NMN essentially constructs a discriminative sub-graph of neighborhood for each entity, which canenable more accurate alignment through neighbor-hood matching.

3.4 Neighborhood MatchingThe neighborhood subgraph, produced by the sam-pling process, determines which neighbors of thetarget entity should be considered in the later stages.In other words, later stages of the NMN processingpipeline will only operate on neighbors within thesubgraph. In the neighborhood matching stage, wewish to find out, for each candidate entity in thecounterpart KG, which neighbors of that entity areclosely related to a neighboring node within thesubgraph of the target entity. Such information isessential for deciding whether two entities (fromtwo KGs) should be aligned.

As discussed in Sec. 3.3, equivalent entitiestend to have similar contexts in real-world corpora;therefore, their neighborhoods sampled by NMNshould be more likely to be similar. NMN exploitsthis observation to estimate the similarities of thesampled neighborhoods.

Candidate selection. Intuitively, for an entity eiin E1, we need to compare its sampled neighbor-hood subgraph with the subgraph of each candi-date entity in E2 to select an optimal alignmententity. Exhaustively trying all possible entitiesof E2 would be prohibitively expensive for large

Page 5: Neighborhood Matching Network for Entity Alignment · Neighborhood Matching Network for Entity Alignment Yuting Wu 1, Xiao Liu , Yansong Feng;2, Zheng Wang3 and Dongyan Zhao1;2 1Wangxuan

real-world KGs. To reduce the matching overhead,NMN takes a low-cost approximate approach. Tothat end, NMN first samples an alignment candi-date set Ci = {ci1 , ci2 , ..., cit |cik ∈ E2} for ei inE1, and then calculates the subgraph similaritiesbetween ei and these candidates. This is based onan observation that the entities in E2 which arecloser to ei in the embedding space are more likelyto be aligned with ei. Thus, for an entity ej in E2,the probability that it is sampled as a candidate forei can be calculated as:

p(hj |hi) =exp(‖hi − hj‖L1)∑

k∈E2exp(‖hi − hk‖L1)

(3)

Cross-graph neighborhood matching. In-spired by recent works in graph matching (Liet al., 2019b), our neighbor matching moduletakes a pair of subgraphs as input, and computesa cross-graph matching vector for each neighbor,which measures how well this neighbor can bematched to any neighbor node in the counterpart.Formally, let (ei, cik) be an entity pair to bemeasured, where ei ∈ E1 and cik ∈ E2 is one ofthe candidates of ei, p and q are two neighbors ofei and cik , respectively. The cross-graph matchingvector for neighbor p can be computed as:

apq =exp(hp · hq)∑

q′∈Nsik

exp(hp · hq′)(4)

mp =∑

q∈Nsik

apq(hp − hq) (5)

where apq are the attention weights, mp is thematching vector for p, and it measures the differ-ence between hp and its closest neighbor in theother subgraph, N s

ikis the sampled neighbor set of

cik , hp and hq are the GCN-output embeddings forp and q respectively.

Then, we concatenate neighbor p’s GCN-outputembeddings with weighted matching vector mp:

hp = [hp‖β ∗mp] (6)

For each target neighbor in a neighborhood sub-graph, the attention mechanism in the matchingmodule can accurately detect which of the neigh-bors in the subgraph of another KG is most likelyto match the target neighbor. Intuitively, the match-ing vector mp captures the difference between thetwo closest neighbors. When the representations ofthe two neighbors are similar, the matching vector

tends to be a zero vector so that their representa-tions stay similar. When the neighbor representa-tions differ, the matching vector will be amplifiedthrough propagation. We find this matching strat-egy works well for our problem settings.

3.5 Neighborhood Aggregation

In the neighborhood aggregation stage, we combinethe neighborhood connection information (learnedat the KG structure embedding stage) as well as theoutput of the matching stage (Sec. 3.4) to generatethe final embeddings used for alignment.

Specifically, for entity ei, we first aggregate itssampled neighbor representations {hp}. Inspiredby the aggregation method in (Li et al., 2016), wecompute a neighborhood representation for ei as:

gi = (∑p∈Ns

i

σ(hpWgate) · hp)WN (7)

Then, we concatenate the central entity ei’sGCN-output representation hi with its neighbor-hood representation to construct the matching ori-ented representation for ei:

hmatchi = [gi‖hi] (8)

3.6 Entity Alignment and Training

Pre-training. As discussed in Sec. 3.3, ourneighborhood sampling is based on the GCN-output entity embeddings. Therefore, we first pre-train the GCN-based KG embedding model to pro-duce quality entity representations. Specifically,we measure the distance between two entities todetermine whether they should be aligned:

d(e1, e2) = ‖he1 − he2‖L1 (9)

The objective of the pre-trained model is:

L =∑

(i,j)∈L

∑(i′,j′)∈L′

max{0, d(i, j)− d(i′, j′) + γ} (10)

where γ > 0 is a margin hyper-parameter; L isour alignment seeds and L′ is the set of negativealigned entity pairs generated by nearest neighborsampling (Kotnis and Nastase, 2017).

Overall training objective. The pre-trainingphase terminates once the entity alignment perfor-mance has converged to be stable. We find thatafter this stage, the entity representations given bythe GCN are sufficient for supporting the neigh-borhood sampling and matching modules. Hence,

Page 6: Neighborhood Matching Network for Entity Alignment · Neighborhood Matching Network for Entity Alignment Yuting Wu 1, Xiao Liu , Yansong Feng;2, Zheng Wang3 and Dongyan Zhao1;2 1Wangxuan

Figure 3: Distribution of difference in the size of neigh-borhoods of aligned entity pairs on DBP15KZH−EN .

we replace the loss function of NMN after the pre-training phase as:

L =∑

(r,t)∈L

∑(r′,t′)∈C

max{0, d(r, t)− d(r′, t′) + γ} (11)

d(r, t) = ‖hmatchr − hmatch

t ‖L1 (12)

where the negative alignments set C ={(r′, t′)|(r′ = r ∧ t′ ∈ Cr) ∨ (t′ = t ∧ r′ ∈ Ct)} ismade up of the alignment candidate sets of r and t,Cr and Ct are generated in the candidate selectionstage described in Sec. 3.4.

Note that our sampling process is non-differentiable, which corrupts the training of weightmatrix Ws in Eq. 2. To avoid this issue, when train-ing Ws, instead of direct sampling, we aggregateall the neighbor information by intuitive weightedsummation:

gwi = (

∑p∈Ni

αip · σ(hpWgate) · hp)WN (13)

where αip is the aggregation weight for neighborp, and is the sampling probability p(hp|hi) for pgiven by Eq. 2. Since the aim of training Ws isto let the learned neighborhood representations ofaligned entities to be as similar as possible, theobjective is:

Lw =∑

(r,t)∈L

‖gwr − gw

t ‖L1 (14)

In general, our model is trained end-to-end afterpre-training. During training, we use Eq. 11 as themain objective function, and, every 50 epochs, wetune Ws using Eq. 14 as the objective function.

4 Experimental Setup

Datasets. Follow the common practice of recentworks (Sun et al., 2018; Cao et al., 2019; Sun et al.,2020), we evaluate our model on DBP15K (Sunet al., 2017) and DWY100K (Sun et al., 2018)datasets, and use the same split with previousworks, 30% for training and 70% for testing. To

Datasets Ent. Rel. Tri. Tri. Remain in S.

ZH-EN ZH 66,469 2,830 153,929 26%EN 98,125 2,317 237,674 100%

JA-EN JA 65,744 2,043 164,373 41%EN 95,680 2,096 233,319 100%

FR-EN FR 66,858 1,379 192,191 45%EN 105,889 2,209 278,590 100%

Table 1: Summary of DBP15K and S-DBP15k.

Datasets Ent. Rel. Tri.

DBP-WD DBpedia 100,000 330 463,294Wikidata 100,000 220 448,774

DBP-YG DBpedia 100,000 302 428,952YAGO3 100,000 31 502,563

Table 2: Summary of DWY100K.

evaluate the performance of NMN in a more chal-lenging setting, we also build a sparse dataset S-DBP15K based on DBP15K. Specifically, we ran-domly remove a certain proportion of triples inthe non-English KG to increase the difference inneighborhood size for entities in different KGs. Ta-ble 1 gives the detailed statistics of DBP15K andS-DBP15K, and the information of DWY100K isexhibited in Table 2. Figure 3 shows the distribu-tion of difference in the size of one-hop neighbor-hoods of aligned entity pairs. Our source code anddatasets are freely available online.1

Comparison models. We compare NMN against12 recently proposed embedding-based alignmentmethods: MTransE (Chen et al., 2017), JAPE (Sunet al., 2017), IPTransE (Zhu et al., 2017), GCN-Align (Wang et al., 2018), BootEA (Sun et al.,2018), SEA (Pei et al., 2019a), RSN (Guo et al.,2019), MuGNN (Cao et al., 2019), KECG (Li et al.,2019a), AliNet (Sun et al., 2020), GMNN (Xu et al.,2019) and RDGCN (Wu et al., 2019a). The lasttwo models also utilize entity names for alignment.

Model variants. To evaluate different compo-nents of our model, we provide two implementationvariants of NMN: (1) NMN (w/o nbr-m), where wereplace the neighborhood matching part by takingthe average of sampled neighbor representationsas the neighborhood representation; and (2) NMN(w/o nbr-s), where we remove the sampling pro-cess and perform neighborhood matching on allone-hop neighbors.

Implementation details. The configuration weuse in the DBP15K and DWY100k datasets is:β = 0.1, γ = 1.0, and we sample 5 neigh-

1https://github.com/StephanieWyt/NMN

Page 7: Neighborhood Matching Network for Entity Alignment · Neighborhood Matching Network for Entity Alignment Yuting Wu 1, Xiao Liu , Yansong Feng;2, Zheng Wang3 and Dongyan Zhao1;2 1Wangxuan

bors for each entity in the neighborhood samplingstage (Sec. 3.3). For S-DBP15K, we set β to1. We sample 3 neighbors for each entity in S-DBP15KZH−EN and S-DBP15KJA−EN , and 10neighbors in S-DBP15KFR−EN . NMN uses a 2-layer GCN. The dimension of hidden representa-tions in GCN layers described in Sec. 3.2 is 300,and the dimension of neighborhood representationgi described in Sec. 3.5 is 50. The size of the can-didate set in Sec. 3.4 is 20 for each entity. Thelearning rate is set to 0.001.

To initialize entity names, for the DBP15Kdatasets, we first use Google Translate to translateall non-English entity names into English, and usepre-trained English word vectors glove.840B.300d2

to construct the initial node features of KGs. Forthe DWY100K datasets, we directly use the pre-trained word vectors to initialize the nodes.

Metrics. Following convention, we use Hits@1and Hits@10 as our evaluation metrics. A Hits@kscore is computed by measuring the proportion ofcorrectly aligned entities ranked in the top k list. Ahigher Hits@k score indicates better performance.

5 Experimental Results

5.1 Performance on DBP15K and DWY100K

Table 3 reports the entity alignment performanceof all approaches on DBP15K and DWY100Kdatasets. It shows that the full implementation ofNMN significantly outperforms all alternative ap-proaches.

Structured-based methods. The top part of thetable shows the performance of the state-of-the-artstructure-based models which solely utilize struc-tural information. Among them, BootEA deliv-ers the best performance where it benefits frommore training instances through a bootstrappingprocess. By considering the structural heterogene-ity, MuGNN and AliNet outperform most of otherstructure-based counterparts, showing the impor-tance of tackling structural heterogeneity.

Entity name initialization. The middle part ofTable 3 gives the results of embedding-based mod-els that use entity name information along withstructural information. Using entity names toinitialize node features, the GNN-based models,GMNN and RDGCN, show a clear improvementover structure-based models, suggesting that entity

2http://nlp.stanford.edu/projects/glove/

names provide useful clues for entity alignment. Inparticular, GMNN achieves the highest Hits@10 onthe DWY100K datasets, which are the only mono-lingual datasets (in English) in our experiments.We also note that, GMNN pre-screens a small can-didate set for each entity based on the entity namesimilarity, and only traverses this candidate set dur-ing testing and calculating the Hits@k scores.

NMN vs. its variants. The bottom part of Ta-ble 3 shows the performance of NMN and itsvariants. Our full NMN implementation substan-tially outperforms all baselines across nearly allmetrics and datasets by accurately modeling en-tity neighborhoods through neighborhood samplingand matching and using entity name information.Specifically, NMN achieves the best Hits@1 scoreon DBP15KZH−EN , with a gain of 2.5% com-pared with RDGCN, and 5.4% over GMNN. Al-though RDGCN employs a dual relation graph tomodel the complex relation information, it does notaddress the issue of neighborhood heterogeneity.While GMNN collects all one-hop neighbors toconstruct a topic entity graph for each entity, itsstrategy might introduce noises since not all one-hop neighbors are favorable for entity alignment.

When comparing NMN and NMN (w/o nbr-m),we can observe around a 2.5% drop in Hits@1 anda 0.6% drop in Hits@10 on average, after removingthe neighborhood matching module. Specifically,the Hits@1 scores between NMN and NMN (w/onbr-m) differ by 3.9% on DBP15KFR−EN . Theseresults confirm the effectiveness of our neighbor-hood matching module in identifying matchingneighbors and estimating the neighborhood sim-ilarity.

Removing the neighbor sampling module fromNMN, i.e., NMN (w/o nbr-s), leads to an averageperformance drop of 0.3% on Hits@1 and 1% onHits@10 on all the datasets. This result shows theimportant role of our sampling module in filteringirrelevant neighbors.

When removing either the neighborhood match-ing module (NMN (w/o nbr-m)) or sampling mod-ule (NMN (w/o nbr-s)) from our main model, wesee a substantially larger drop in both Hits@1 andHits@10 on DBP15K than on DWY100K. One rea-son is that the heterogeneity problem in DBP15K ismore severe than that in DWY100K. The averageproportion of aligned entity pairs that have a differ-ent number of neighbors is 89% in DBP15K com-pared to 84% in DWY100K. These results show

Page 8: Neighborhood Matching Network for Entity Alignment · Neighborhood Matching Network for Entity Alignment Yuting Wu 1, Xiao Liu , Yansong Feng;2, Zheng Wang3 and Dongyan Zhao1;2 1Wangxuan

Models DBPZH-EN DBPJA-EN DBPFR-EN DBP-WD DBP-YGHits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10

MTransE (Chen et al., 2017) 30.8 61.4 27.9 57.5 24.4 55.6 28.1 52.0 25.2 49.3JAPE (Sun et al., 2017) 41.2 74.5 36.3 68.5 32.4 66.7 31.8 58.9 23.6 48.4IPTransE (Zhu et al., 2017) 40.6 73.5 36.7 69.3 33.3 68.5 34.9 63.8 29.7 55.8GCN-Align (Wang et al., 2018) 41.3 74.4 39.9 74.5 37.3 74.5 50.6 77.2 59.7 83.8SEA (Pei et al., 2019a) 42.4 79.6 38.5 78.3 40.0 79.7 51.8 80.2 51.6 73.6RSN (Guo et al., 2019) 50.8 74.5 50.7 73.7 51.6 76.8 60.7 79.3 68.9 87.8KECG (Li et al., 2019a) 47.8 83.5 49.0 84.4 48.6 85.1 63.2 90.0 72.8 91.5MuGNN (Cao et al., 2019) 49.4 84.4 50.1 85.7 49.5 87.0 61.6 89.7 74.1 93.7AliNet (Sun et al., 2020) 53.9 82.6 54.9 83.1 55.2 85.2 69.0 90.8 78.6 94.3BootEA (Sun et al., 2018) 62.9 84.8 62.2 85.4 65.3 87.4 74.8 89.8 76.1 89.4

GMNN (Xu et al., 2019) 67.9 78.5 74.0 87.2 89.4 95.2 93.0 99.6 94.4 99.8RDGCN (Wu et al., 2019a) 70.8 84.6 76.7 89.5 88.6 95.7 97.9 99.1 94.7 97.3

NMN 73.3 86.9 78.5 91.2 90.2 96.7 98.1 99.2 96.0 98.2w/o nbr-m 71.1 86.7 75.4 90.4 86.3 95.8 96.0 98.4 95.0 97.8w/o nbr-s 73.0 85.6 77.9 88.8 89.9 95.7 98.0 99.0 95.9 98.1

Table 3: Performance on DBP15K and DWY100K.

Models ZH-EN JA-EN FR-ENHits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10

BootEA 12.2 27.5 27.8 52.6 32.7 53.2

GMNN 47.5 68.3 58.8 78.2 75.0 90.9RDGCN 60.7 74.6 69.3 82.9 83.6 92.6

NMN 62.0 75.1 70.3 84.4 86.3 94.0w/o nbr-m 52.0 71.1 62.1 82.7 80.0 92.0w/o nbr-s 60.9 74.1 70.7 84.5 86.5 94.2

Table 4: Performance on S-DBP15K.

that our sampling and matching modules are par-ticularly important, when the neighborhood sizesof equivalent entities greatly differ and especiallythere may be few common neighbors in their neigh-borhoods.

5.2 Performance on S-DBP15KOn the more sparse and challenging datasets S-DBP15K, we compare our NMN model withthe strongest structure-based model, BootEA, andGNN-based models, GMNN and RDGCN, whichalso utilize the entity name initialization.

Baseline models. In Table 4, we can observethat all models suffer a performance drop, whereBootEA endures the most significant drop. Withthe support of entity names, GMNN and RDGCNachieve better performances over BootEA. Theseresults show when the alignment clues are sparse,structural information alone is not sufficient to sup-port precise comparisons, and the entity name se-mantics are particularly useful for accurate align-ment in such case.

NMN. Our NMN outperforms all three baselineson all sparse datasets, demonstrating the effec-tiveness and robustness of NMN. As discussed inSec. 1, the performances of existing embedding-based methods decrease significantly as the gap ofequivalent entities’ neighborhood sizes increases.

Specifically, on DBP15KZH−EN , our NMN out-performs RDGCN, the best-performing baseline,by a large margin, achieving 65%, 53% and 48%on Hits@1 on the entity pairs whose number ofneighbors differs by more than 10, 20 and 30, re-spectively.

Sampling and matching strategies. When wecompare NMN and NMN (w/o nbr-m) on the S-DBP15K, we can see a larger average drop inHits@1 than on the DBP15K (8.2% vs. 3.1%).The result indicates that our neighborhood match-ing module plays a more important role on the moresparse dataset. When the alignment clues are lessobvious, our matching module can continuouslyamplify the neighborhood difference of an entitypair during the propagation process. In this way,the gap between the equivalent entity pair and thenegative pairs becomes larger, leading to correctalignment.

Compared with NMN, removing sampling mod-ule does hurt NMN in both Hits@1 and Hits@10on S-DBP15KZH−EN . But, it is surprisingthat NMN (w/o nbr-s) delivers slightly better re-sults than NMN on S-DBP15KJA−EN and S-DBP15KFR−EN . Since the average number ofneighbors of entities in S-DBP15K is much lessthan that in the DBP15K datasets. When the num-ber of neighbors is small, the role of sampling willbe unstable. In addition, our sampling method isrelatively simple. When the alignment clues arevery sparse, our strategy may not be robust enough.We will explore more adaptive sampling methodand scope in the future.

5.3 Analysis

Impact of neighborhood sampling strategies.To explore the impact of neighborhood sampling

Page 9: Neighborhood Matching Network for Entity Alignment · Neighborhood Matching Network for Entity Alignment Yuting Wu 1, Xiao Liu , Yansong Feng;2, Zheng Wang3 and Dongyan Zhao1;2 1Wangxuan

Figure 4: Comparison between our neighborhood sampling strategy and random sampling on S-DBP15K.

维亚康姆 雷神拯救大兵瑞恩

联合国际影业 勇敢的心

Hollywood

National Amusements

Star Trek Into Darkness

Saving Private Ryan

Viacom0.128

0.130

0.132

0.134

0.136

0.138

0.140

0.142

Figure 5: Visualization of attention weights in theneighborhood matching module for the example ofParamount Pictures. The green and blue words are twopairs of equivalent neighbors.

strategies, we compare our NMN with a variantthat uses random sampling strategy on S-DBP15Kdatasets. Figure 4 illustrates the Hits@1 ofNMN using our designed graph sampling method(Sec. 3.3) and a random-sampling-based variantwhen sampling different number of neighbors. OurNMN consistently delivers better results comparedto the variant, showing that our sampling strategycan effectively select more informative neighbors.

Impact of neighborhood sampling size. FromFigure 4, for S-DBP15KZH−EN , both modelsreach a performance plateau with a sampling sizeof 3, and using a bigger sampling size would lead toperformance degradation. For S-DBP15KJA−EN

and S-DBP15KFR−EN , we observe that our NMNperforms similarly when sampling different num-ber of neighbors. From Table 1, we can seethat S-DBP15KZH−EN is more sparse than S-DBP15KJA−EN and S-DBP15KFR−EN . Allmodels deliver much lower performance on S-DBP15KZH−EN . Therefore, the neighbor qualityof this dataset might be poor, and a larger samplingsize will introduce more noise. On the other hand,the neighbors in JA-EN and FR-EN datasets mightbe more informative. Thus, NMN is not sensitiveto the sampling size on these two datasets.

How does the neighborhood matching modulework? In an attempt to understand how ourneighborhood matching strategy helps alignment,we visualize the attention weights in the neighbor-hood matching module. Considering an equivalent

entity pair in DBP15KZH−EN , both of which indi-cate an American film studio Paramount Pictures.From Figure 5, we can see that the five neighborssampled by our sampling module for each centralentity are very informative ones for aligning thetwo central entities, such as the famous movies re-leased by Paramount Pictures, the parent companyand subsidiary of Paramount Pictures. This demon-strates the effectiveness of our sampling strategyagain. Among the sampled neighbors, there arealso two pairs of common neighbors (indicate Sav-ing Private Ryan and Viacom). We observe thatfor each pair of equivalent neighbors, one neigh-bor can be particularly attended by its counterpart(the corresponding square has a darker color). Thisexample clearly demonstrates that our neighbor-hood matching module can accurately estimate theneighborhood similarity by accurately detecting thesimilar neighbors.

6 Conclusion

We have presented NMN, a novel embedded-basedframework for entity alignment. NMN tacklesthe ubiquitous neighborhood heterogeneity in KGs.We achieve this by using a new sampling-basedapproach to choose the most informative neigh-bors for each entity. As a departure from priorworks, NMN simultaneously estimates the similar-ity of two entities, by considering both topologicalstructure and neighborhood similarity. We performextensive experiments on real-world datasets andcompare NMN against 12 recent embedded-basedmethods. Experimental results show that NMNachieves the best and more robust performance,consistently outperforming competitive methodsacross datasets and evaluation metrics.

Acknowledgments

This work is supported in part by the Na-tional Hi-Tech R&D Program of China (No.2018YFB1005100), the NSFC under grant agree-ments 61672057, 61672058 and 61872294, anda UK Royal Society International CollaborationGrant. For any correspondence, please contact Yan-song Feng.

Page 10: Neighborhood Matching Network for Entity Alignment · Neighborhood Matching Network for Entity Alignment Yuting Wu 1, Xiao Liu , Yansong Feng;2, Zheng Wang3 and Dongyan Zhao1;2 1Wangxuan

ReferencesJoost Bastings, Ivan Titov, Wilker Aziz, Diego

Marcheggiani, and Khalil Sima’an. 2017. Graphconvolutional encoders for syntax-aware neural ma-chine translation. In Proceedings of the 2017 Con-ference on Empirical Methods in Natural LanguageProcessing, pages 1957–1967, Copenhagen, Den-mark. Association for Computational Linguistics.

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko.2013. Translating embeddings for modeling multi-relational data. In Advances in Neural InformationProcessing Systems 26, pages 2787–2795. CurranAssociates, Inc.

Yixin Cao, Zhiyuan Liu, Chengjiang Li, Zhiyuan Liu,Juanzi Li, and Tat-Seng Chua. 2019. Multi-channelgraph neural network for entity alignment. In Pro-ceedings of the 57th Annual Meeting of the Asso-ciation for Computational Linguistics, pages 1452–1461, Florence, Italy. Association for ComputationalLinguistics.

Muhao Chen, Yingtao Tian, Kai-Wei Chang, StevenSkiena, and Carlo Zaniolo. 2018. Co-training em-beddings of knowledge graphs and entity descrip-tions for cross-lingual entity alignment. In Proceed-ings of the Twenty-Seventh International Joint Con-ference on Artificial Intelligence, IJCAI 2018, July13-19, 2018, Stockholm, Sweden, pages 3998–4004.ijcai.org.

Muhao Chen, Yingtao Tian, Mohan Yang, and CarloZaniolo. 2017. Multilingual knowledge graph em-beddings for cross-lingual knowledge alignment. InProceedings of the Twenty-Sixth International JointConference on Artificial Intelligence, IJCAI 2017,Melbourne, Australia, August 19-25, 2017, pages1511–1517. ijcai.org.

Lingbing Guo, Zequn Sun, and Wei Hu. 2019. Learn-ing to exploit long-term relational dependencies inknowledge graphs. In Proceedings of the 36th In-ternational Conference on Machine Learning, ICML2019, 9-15 June 2019, Long Beach, California, USA,volume 97 of Proceedings of Machine Learning Re-search, pages 2505–2514. PMLR.

Pili Hu and Wing Cheong Lau. 2013. A surveyand taxonomy of graph sampling. arXiv preprintarXiv:1308.5865.

Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutionalnetworks. In 5th International Conference on Learn-ing Representations, ICLR 2017, Toulon, France,April 24-26, 2017, Conference Track Proceedings.OpenReview.net.

Bhushan Kotnis and Vivi Nastase. 2017. Analy-sis of the impact of negative sampling on linkprediction in knowledge graphs. arXiv preprintarXiv:1708.06816.

Chengjiang Li, Yixin Cao, Lei Hou, Jiaxin Shi, JuanziLi, and Tat-Seng Chua. 2019a. Semi-supervisedentity alignment via joint knowledge embeddingmodel and cross-graph model. In Proceedings ofthe 2019 Conference on Empirical Methods in Nat-ural Language Processing and the 9th InternationalJoint Conference on Natural Language Processing(EMNLP-IJCNLP), pages 2723–2732, Hong Kong,China. Association for Computational Linguistics.

Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals,and Pushmeet Kohli. 2019b. Graph matching net-works for learning the similarity of graph structuredobjects. In Proceedings of the 36th InternationalConference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, vol-ume 97 of Proceedings of Machine Learning Re-search, pages 3835–3845. PMLR.

Yujia Li, Richard Zemel, Marc Brockschmidt, andDaniel Tarlow. 2016. Gated graph sequence neuralnetworks. In Proceedings of ICLR’16.

Farzaneh Mahdisoltani, Joanna Biega, and Fabian M.Suchanek. 2015. YAGO3: A knowledge base frommultilingual wikipedias. In CIDR 2015, Seventh Bi-ennial Conference on Innovative Data Systems Re-search, Asilomar, CA, USA, January 4-7, 2015, On-line Proceedings. www.cidrdb.org.

Diego Marcheggiani and Ivan Titov. 2017. Encodingsentences with graph convolutional networks for se-mantic role labeling. In Proceedings of the 2017Conference on Empirical Methods in Natural Lan-guage Processing, pages 1506–1515, Copenhagen,Denmark. Association for Computational Linguis-tics.

Shichao Pei, Lu Yu, Robert Hoehndorf, and XiangliangZhang. 2019a. Semi-supervised entity alignment viaknowledge graph embedding with awareness of de-gree difference. In The World Wide Web Conference,WWW 2019, San Francisco, CA, USA, May 13-17,2019, pages 3130–3136. ACM.

Shichao Pei, Lu Yu, and Xiangliang Zhang. 2019b.Improving cross-lingual entity alignment via opti-mal transport. In Proceedings of the Twenty-EighthInternational Joint Conference on Artificial Intelli-gence, IJCAI 2019, Macao, China, August 10-16,2019, pages 3231–3237. ijcai.org.

Afshin Rahimi, Trevor Cohn, and Timothy Baldwin.2018. Semi-supervised user geolocation via graphconvolutional networks. In Proceedings of the 56thAnnual Meeting of the Association for Computa-tional Linguistics (Volume 1: Long Papers), pages2009–2019, Melbourne, Australia. Association forComputational Linguistics.

John W. Raymond, Eleanor J. Gardiner, and Peter Wil-lett. 2002. RASCAL: calculation of graph similarityusing maximum common edge subgraphs. Comput.J., 45(6):631–644.

Page 11: Neighborhood Matching Network for Entity Alignment · Neighborhood Matching Network for Entity Alignment Yuting Wu 1, Xiao Liu , Yansong Feng;2, Zheng Wang3 and Dongyan Zhao1;2 1Wangxuan

Michael Sejr Schlichtkrull, Thomas N. Kipf, PeterBloem, Rianne van den Berg, Ivan Titov, and MaxWelling. 2018. Modeling relational data with graphconvolutional networks. In The Semantic Web - 15thInternational Conference, ESWC 2018, Heraklion,Crete, Greece, June 3-7, 2018, Proceedings, volume10843 of Lecture Notes in Computer Science, pages593–607. Springer.

Rupesh Kumar Srivastava, Klaus Greff, and JurgenSchmidhuber. 2015. Highway networks. arXivpreprint arXiv:1505.00387.

Zequn Sun, Wei Hu, and Chengkai Li. 2017.Cross-lingual entity alignment via joint attribute-preserving embedding. In The Semantic Web - ISWC2017 - 16th International Semantic Web Conference,Vienna, Austria, October 21-25, 2017, Proceedings,Part I, volume 10587 of Lecture Notes in ComputerScience, pages 628–644. Springer.

Zequn Sun, Wei Hu, Qingheng Zhang, and YuzhongQu. 2018. Bootstrapping entity alignment withknowledge graph embedding. In Proceedings of theTwenty-Seventh International Joint Conference onArtificial Intelligence, IJCAI 2018, July 13-19, 2018,Stockholm, Sweden, pages 4396–4402. ijcai.org.

Zequn Sun, Chengming Wang, Wei Hu, Muhao Chen,Jian Dai, Wei Zhang, and Yuzhong Qu. 2020.Knowledge graph alignment network with gatedmulti-hop neighborhood aggregation. In AAAI.

Bayu Distiawan Trisedya, Jianzhong Qi, and RuiZhang. 2019. Entity alignment between knowledgegraphs using attribute embeddings. In Proceedingsof the AAAI Conference on Artificial Intelligence,volume 33, pages 297–304.

Petar Velickovic, Guillem Cucurull, Arantxa Casanova,Adriana Romero, Pietro Lio, and Yoshua Bengio.2018. Graph Attention Networks. In ICLR.

Zhichun Wang, Qingsong Lv, Xiaohan Lan, andYu Zhang. 2018. Cross-lingual knowledge graphalignment via graph convolutional networks. In Pro-ceedings of the 2018 Conference on Empirical Meth-ods in Natural Language Processing, pages 349–357, Brussels, Belgium. Association for Computa-tional Linguistics.

Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, RuiYan, and Dongyan Zhao. 2019a. Relation-aware en-tity alignment for heterogeneous knowledge graphs.In Proceedings of the Twenty-Eighth InternationalJoint Conference on Artificial Intelligence, IJCAI2019, Macao, China, August 10-16, 2019, pages5278–5284. ijcai.org.

Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang,and Dongyan Zhao. 2019b. Jointly learning entityand relation representations for entity alignment. InProceedings of the 2019 Conference on EmpiricalMethods in Natural Language Processing and the

9th International Joint Conference on Natural Lan-guage Processing (EMNLP-IJCNLP), pages 240–249, Hong Kong, China. Association for Computa-tional Linguistics.

Kun Xu, Liwei Wang, Mo Yu, Yansong Feng, YanSong, Zhiguo Wang, and Dong Yu. 2019. Cross-lingual knowledge graph alignment via graph match-ing neural network. In Proceedings of the 57th An-nual Meeting of the Association for ComputationalLinguistics, pages 3156–3161, Florence, Italy. Asso-ciation for Computational Linguistics.

Xifeng Yan, Philip S. Yu, and Jiawei Han. 2004. Graphindexing: A frequent structure-based approach. InProceedings of the ACM SIGMOD InternationalConference on Management of Data, Paris, France,June 13-18, 2004, pages 335–346. ACM.

Hsiu-Wei Yang, Yanyan Zou, Peng Shi, Wei Lu, JimmyLin, and Xu Sun. 2019. Aligning cross-lingual enti-ties with multi-aspect information. In Proceedingsof the 2019 Conference on Empirical Methods inNatural Language Processing and the 9th Interna-tional Joint Conference on Natural Language Pro-cessing (EMNLP-IJCNLP), pages 4431–4441, HongKong, China. Association for Computational Lin-guistics.

Rui Ye, Xin Li, Yujie Fang, Hongyu Zang, andMingzhong Wang. 2019. A vectorized relationalgraph convolutional network for multi-relational net-work alignment. In Proceedings of the Twenty-Eighth International Joint Conference on ArtificialIntelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 4135–4141. ijcai.org.

Qingheng Zhang, Zequn Sun, Wei Hu, Muhao Chen,Lingbing Guo, and Yuzhong Qu. 2019. Multi-viewknowledge graph embedding for entity alignment.In Proceedings of the Twenty-Eighth InternationalJoint Conference on Artificial Intelligence, IJCAI2019, Macao, China, August 10-16, 2019, pages5429–5435. ijcai.org.

Hao Zhu, Ruobing Xie, Zhiyuan Liu, and MaosongSun. 2017. Iterative entity alignment via jointknowledge embeddings. In Proceedings of theTwenty-Sixth International Joint Conference on Ar-tificial Intelligence, IJCAI 2017, Melbourne, Aus-tralia, August 19-25, 2017, pages 4258–4264. ij-cai.org.

Qiannan Zhu, Xiaofei Zhou, Jia Wu, Jianlong Tan,and Li Guo. 2019. Neighborhood-aware attentionalrepresentation for multilingual knowledge graphs.In Proceedings of the Twenty-Eighth InternationalJoint Conference on Artificial Intelligence, IJCAI2019, Macao, China, August 10-16, 2019, pages1943–1949. ijcai.org.