cultural and geographical inﬂuences on image

Cultural and Geographical Influences on Image Translatability of Wordsacross Languages

Nikzad KhaniBoston [email protected]

Isidora Chara TourniBoston [email protected]

Mohammad Sadegh RasooliUniversity of [email protected]

Chris Callison-BurchUniversity of Pennsylvania

[email protected]

Derry Tanti WijayaBoston [email protected]

Abstract

Neural Machine Translation (NMT) modelshave been observed to produce poor transla-tions when there are few/no parallel sentencesto train the models. In the absence of paral-lel data, several approaches have turned to theuse of images to learn translations. Since im-ages of words, e.g., horse may be unchangedacross languages, translations can be identi-fied via images associated with words in dif-ferent languages that have a high degree of vi-sual similarity. However, translating via im-ages has been shown to improve upon text-only models only marginally. To better under-stand when images are useful for translation,we study image translatability of words, whichwe define as the translatability of words viaimages, by measuring intra- and inter-clustersimilarities of image representations of wordsthat are translations of each other. We findthat images of words are not always invari-ant across languages, and that language pairswith shared culture, meaning having either acommon language family, ethnicity or religion,have improved image translatability (i.e., havemore similar images for similar words) com-pared to its converse, regardless of their ge-ographic proximity. In addition, in line withprevious works that show images help more intranslating concrete words, we found that con-crete words have improved image translatabil-ity compared to abstract ones.

1 Introduction

Neural machine translation (NMT) for low-resource languages has drawn a lot of attention dueto the increasing awareness of the lack of linguisticand geographic diversity in NLP research (Joshiet al., 2020; Orife et al.). Since parallel datafor these languages is scarce, it necessitates theuse of other data to help translation e.g., mono-lingual texts in unsupervised MT (Lample et al.,2018b,a,c; Artetxe et al., 2018) or images in multi-modal MT (Barrault et al., 2018).

Previous works on using images for translationtypically accept that images are useful due to theirlanguage invariance (Rotman et al., 2018). Sinceeveryday words such as chair denote concepts thatexist independently of any language, images thatground their meanings should also be invariant tothe language. However, to the best of our knowl-edge, this conjecture on image-language invariancehas never been tested. As images’ usefulness fortranslation has only been shown to be marginal(Specia et al., 2016; Elliott et al., 2017; Barraultet al., 2018), it is important to study this conjecturein relation to the characteristics of languages tounderstand when and to what extent images canaid translation. An alternative view would be thatimages may be different to some extent in differentlanguages since they reflect the ways different peo-ple interact with these concepts; this may dependon where they live and the communities they live in(Evans and Levinson, 2009). For example, imagesof the word breakfast in different languages mayreflect the different cuisines of the communitiesthat speak the languages.

While most multimodal MT datasets are limitedto a small set of European languages that comefrom the same language family, and are spoken bycommunities that are culturally and geographicallyclose, the Massively Multilingual Image Dataset(MMID) (Hewitt et al., 2018) is constructed specif-ically to facilitate large-scale multilingual researchin translating words via images.

MMID consists of up to 10K words and 100images per word in 98 languages. This datasetprovides an opportunity for us to examine howgeographical and cultural relatedness between lan-guages affect translation of words via images. Asthe use of parallel data from related languages havebeen found to improve MT for low resource lan-guages (Zoph et al., 2016; Nguyen and Chiang,2017; Dabre et al., 2017), we want to study if thesame extends to translation via images. Specifi-

cally, we want to explore if translatability of wordsbetween two languages via images is influenced bythe cultural similarity and geographical proximityof their communities. A recent study, (Thomp-son et al., 2020), has observed such correlationsof culture and geography to semantic alignment ofword meanings between languages that are mea-sured through similarities in the word embeddings.We hypothesize that the same is true for images,and that the alignment of meanings conveyed viaimages coincides with culture and geography.

In this work, we primarily define culture as theset of “Language, Norms and Beliefs" of a commu-nity (Heather Griffiths, 2015). These elements formour interpretation of cultural closeness between lan-guages, which consist of their common linguistic,ethnic, and religious properties. Our goal is to in-trinsically evaluate to what extent images can aidin word translation, for words in languages closein a variety of these characteristics. Assuming thateach word is associated with a number of imagesthat convey its meaning, we measure the degree towhich images of words that are translations of eachother in different languages have similar represen-tations (thus will help in translation). We call thismeasure image translatability or the capacity ofword meaning to be transferred from one languageto another via images. If images are indeed lan-guage invariant, we should observe similar imagetranslatability across different language pairs.

We identify how close word translations are interms of their image representations (embeddings).Our findings suggest that languages with culturalsimilarity (defined as a combination of linguistic,ethnicity, or religious similarity of the communitiesat the cultural centres of the languages by Glottolog(Hammarström et al., 2020)) coincides with theirtranslatability via images, and that the translata-bility of languages with cultural similarity outper-forms that in those with geographical proximity.

Our paper is structured as follows: In section 2we discuss previous research on image-aided wordtranslation, and how roots, geography and culturalcharacteristics of languages correlate with semanticalignment of words. In section 3 we describe ourdataset and text-image corpora. We also introducethe language pairs we examine and estimate theircloseness in culture and geography. In section 4,we present our approach for measuring translatabil-ity of words in terms of the similarity of their imagerepresentations. Section 5 shows an analysis of our

results and how translatability of words via images,which affects the images’ fitness for translation,correlates with language properties. In Section 6,we discuss noteworthy examples that illustrate ourfindings, before concluding in Section 7.

2 Related Work

2.1 Translating Words via Images

This paper extends the work of Hewitt et al. (2018),which introduces a multi-lingual dataset of wordsin different languages, along with matching im-ages, for word translation. Our goal however, isnot to improve on the state-of-the-art methods inword translation using images, but to understandthe specific characteristics of languages that influ-ence the quality of translation via images. Hewittet al. (2018) are the first to create a large-scalemultilingual words and images dataset, withouta specific part-of speech focus, proposing also anovel method to rate the concreteness of a wordto be used in translations. Concreteness (Paivioet al., 1968) identifies tangible concepts and mentalimages that arise in correspondence to the word.Due to their strong visual representation, concretewords are easier to represent using images. Indeed,the measure of a word’s concreteness has been ob-served to predict the effectiveness of its images totranslate the word. (Kiela et al., 2015). A con-cept synonymous to concreteness is imageability(Kastner et al., 2020).

In terms of word translation, there exists a signif-icant body of work in the area of bilingual lexiconinduction, which is the task of translating wordsacross languages without any parallel data (Fungand Yee, 1998; Rapp, 1999). Approaches can bedivided into two types, text-based, which aim tofind word translations by employing the words’ lin-guistic information, and vision-based that use thewords’ images as pivots for translation (Bergsmaand Van Durme, 2011; Kiela et al., 2015). Ad-ditionally, there are works that have incorporatedadditional signals for translation such as Wikipediainterlingual links (Wijaya et al., 2017).

The core idea in a large number of vision-basedmethods is using images to learn word and imageembeddings that integrate all linguistic and visualinformation available to improve word translation(Calixto et al., 2017; Gella et al., 2017; Karpathyand Fei-Fei, 2017; Vulic et al., 2016). Recent re-search in this area extends prior ideas in learningmultilingual word-image embeddings, extracting

more complex and useful information from images,and applying the methods in few shot scenarios.Singhal et al. (2019) learn multilingual and multi-modal word embeddings from weakly-supervisedimage-text data with a simple bag-of-words-basedembedding model that incorporates word contextand image information. Similarly, in Chen et al.(2019) the authors suggest mapping linguistic fea-tures based on sentence context and localized im-age features from image regions into a joint space.Aside from translation, multilingual text represen-tations aligned to images has also been used toboost performance in vision-language tasks such asmultilingual image-sentence retrieval (Gella et al.,2017; Wehrmann et al., 2019; Kim et al., 2020;Burns et al., 2020).

2.2 Language Characteristics in Translation

The claim that the concepts of language, cultureand their geographical affiliations are interdepen-dent, constantly and dynamically evolving anddefining each other, has been widely discussed andis well established in the literature. Culture is con-sidered an indistinguishable part of languages whentranslating from one language to another. The im-portance of cultural literacy of the translator andhis/her awareness of cultural factors, views andtradition, apart from word meaning, for produc-ing high quality translations is indisputable (Nida,1945; Wakabayashi, 1991; Janfaza et al., 2012).

Despite the importance of language, cultureand geography in translation, and findings thatparallel data from similar, higher resource lan-guages can help improve MT of low resource lan-guages (Kocmi and Bojar, 2018), no previous workhas studied how language similarity may influencetranslation via images. The most notable recentwork in this area, that is most similar to ours, isthat of Thompson et al. (2020). The authors predictsemantic similarity of words in 41 languages fromthe NEL dataset (Dellert et al., 2020) and examinethe relationships between word semantic similarity(measured via word embeddings) with the cultural,historical and geographical aspects of the communi-ties speaking the language. Their findings, that therole of cultural similarities to this prediction is su-perior to that of geographical ones, align with ours.However, their methods differ from ours in manyaspects. They use word-only embeddings to mea-sure semantic alignment of words and only a smalland publicly available set of images (Duñabeitia

LanguagePair Similarity

Geography Language Ethnicity Religion

az traz ru 5 5 5

ko zh 5

ko ja 5 5

zh ja 5 5

zh ko 5

ja zh 5 5

ja ko 5 5

ar ur 5 5 5

ar fa 5 5

ar he 5 5

ur ar 5 5 5

ur hi 5

es fres ptfi hu 5

fi no 5 5

af nl 5

af sw 5 5 5

Table 1: The 19 language pairs we explore in this workand the nature of their similarity: Geographical or Cul-tural: the same Language family, Ethnicity or Religion.

et al., 2018), for validation of the predicted scores,in a supervised manner, and for a small subset of 6languages in the Indo-European family.

3 Data

3.1 The Massively Multilingual ImageDataset

The dataset we use is the Massively MultilingualImage Dataset (MMID) from Hewitt et al. (2018).It covers 98 languages, containing at most 10,000words per language and 100 images per word. Foreach word, in any language, we are given the col-lected images matching the word meaning, and theword’s English translation. They use a language fil-tering step to ensure that images for each languageare collected only from web pages that are identi-fied as containing texts written in the language.

We choose to examine specific language pairs sothat for each source language there are two or moretarget languages whose shared characteristics withthe source language differ in zero or more aspects.The shared characteristics between the source andtarget language include shared culture (i.e., eitherthey are from the same language family1 or thecommunities at their cultural centers have the same

1en.wikipedia.org/wiki/List_of_language_families

https://en.wikipedia.org/wiki/List_of_language_familie

major ethnic group2 or major religion3) or sharedgeography (i.e., the countries at their cultural cen-ters share land border). For example, for Finnish,we include two target languages: one that has geo-graphical proximity (Norwegian) and another thathas ethnolinguistic similarity (Hungarian). In thisway, we intend to examine for each source lan-guage, which of these groups of characteristics(culture or geography) are more important in imageaided word translation, and whether culture or ge-ography dominate one another. We form languagepairs from the following 20 languages: Afrikaans(af), Arabic (ar), Azerbaijani (az), Chinese (zh),Dutch (nl), Finnish (fi), French (fr), Hebrew (he),Hindi (hi), Hungarian (hu), Japanese (ja), Korean(ko), Norwegian (no), Persian (fa), Portuguese (pt),Russian (ru), Spanish (es), Swahili (sw), Turkish(tr) and Urdu (ur). We summarize the languagepairs and their shared characteristics in Table 1.

3.2 Dataset collection and preparation

We download MMID images4 for all the source andtarget languages in our language pairs. In order toget vector embeddings for the images, we scalethe images to 224 x 224 pixels, normalize and feedthem as input into the ResNet-50 network (He et al.,2015), using network weights pre-trained on Ima-geNet. We obtain image embeddings from the lastaverage pooling layer of ResNet-50, which givesus a 2048 dimensional vector embedding for eachimage. For each word, we call the embeddings ofthe associated images the word’s image embedding.Because cosine similarity, which underlies parts ofthis work and previous works for bilingual lexiconinduction via images (Bergsma et al., 2011; Kielaet al., 2015; Hewitt et al., 2018), is non-invariantto translation (Korenius et al., 2007) we treat allvectors with respect to the origin rather than somemean center for each image cluster5.

Since the MTurk word translations that comewith MMID (Pavlick et al., 2014) are limited incoverage and quality i.e., they contain only transla-tions to English and the coverage and quality arehigh (≥70% accuracy) only for a small set (13) ofEuropean and Indian languages where many MTurkworkers are located; we create translation dictio-naries for each of our language pairs using GoogleTranslate, translating all words in the source lan-

2en.wikipedia.org/wiki/Ethnic_group

3pewforum.org/2015/04/02/religious-projection-table/2020/percent/all/

4http://multilingual-images.org/

5Our code is available at https://github.com/nikzadkhani/MMID-CNN-Analysis

guage to the target language. We compute trans-latability of words whose translations have asso-ciated images in MMID. If a word in the sourceis translated to a phrase, we use the last word inthe phrase to find associated images in the dataset.This heuristic applies to only 10% of the wordsin the dataset and the Google translations with themajority (80%) of the first word in the phrase trans-lations being indicative of functional words: sharedand appearing more than 50 times in the dataset.

4 Methodology

Given images that are associated with the wordws in the source language s, and wt in the targetlanguage t, we define two measures that determinehow well a word can be translated by its images.The first measures whether ws and wt have over-lapping or disjoint image embeddings. The secondmeasures whether the spread of the image embed-ding for ws, and, similarly, for wt, is tight or loose:such a measure of image dispersion has been foundto help predict the usefulness of image representa-tions for translation (Kiela et al., 2014, 2015).

Specifically, when images of ws and wt are tightand overlapping in the embedding space, it showsthat the images have little diversity (low disper-sion) and are similar between ws and wt, indi-cating potentially good translation between them.Conversely, if the images are either spread out ordisjoint, it means that the images have greater di-versity (high dispersion) or differ between ws andwt, indicating potentially poor translation betweenthem. We refer to the degree of overlap betweentwo clusters of images associated with ws and wt

respectively as their inter-cluster similarity, and tothe degree of tightness or looseness of the imagesin each cluster as their intra-cluster similarity.

Our conjecture is that this is equivalent to repre-senting image embeddings as samples from somegenerator distribution G. We can call the gener-ator distribution for a given source word Gs anda generator distribution for a given target wordGt. Two words are translations of each other whenGs = Gt, and conversely two words are poor trans-lations when Gs 6= Gt. Thus, inter-cluster simi-larity checks to see if an image embedding fromGs could have been produced by Gt. Note thatthis is a necessary condition but it is not sufficientto say Gs = Gt if inter-cluster similarity is high,because an image embedding from Gs can also beproduced by some random image embedding gen-

https://en.wikipedia.org/wiki/Ethnic_group

https://www.pewforum.org/2015/04/02/religious-projection-table/2020/percent/all/

http://multilingual-images.org/

https://github.com/nikzadkhani/MMID-CNN-Analysis

erator Gr with Gs 6= Gr. Intra-cluster similarityis a measure of how similar samples from a singlegenerator are to each other. This will ensure thatGt and Gs are not random generators and are accu-rate representations of the word they are generatingimage embeddings for. In other words, having ahigh intra-cluster similarity implies that Gs 6= Gr

and Gt 6= Gr. Thus, we have sufficient conditionswhen we have high inter- and intra-cluster similari-ties to say Gs = Gt.

4.1 Inter-Cluster SimilarityTo measure the degree of overlap (inter-cluster sim-ilarity) between images associated with the wordws in the source language, and those associatedwith the word wt in the target language, we firstcluster their image embeddings with a k-meansclustering algorithm (k = 2). Then, we measure thedegree of overlap between images of the two wordsby the homogeneity score of the resulting clusters,hws,wt ∈ [0, 1] (Rosenberg and Hirschberg, 2007),calculated given the words ws and wt as imagelabels. A homogeneity score of 0 signals that allthe image embeddings come from the distributionof a single class, hence represent the same wordor concept (ws = wt). In this case, we say thatthe images of the two words have high inter-clustersimilarity. A score of 1 means that the k-meansclustering was able to identify two mutually exclu-sive clusters of images indicative that the imagescome from two different generators (Gs 6= Gt). Inother words, the image embeddings were sampledfrom two different words or senses (ws 6= wt).

However, if images are highly dispersed (havehigh diversity, loose clusters), then the inter-clustersimilarity may be deceptively high (i.e., low homo-geneity) since loose clusters may overlap to someextent. Thus, homogeneity score is only an effec-tive measure of how good an image-aided transla-tion is on the condition that the clusters are suffi-ciently tight (i.e., have high intra-cluster similarity).In Section 4.5, we discuss how we compute thisthreshold for intra-cluster similarity.

4.2 Intra-Cluster SimilarityImages of a given word have low intra-cluster simi-larity when the images have high dispersion, whichmay be due to the word being abstract (e.g., wordslike concept whose images might be very diverse)or when the word has many different senses (e.g.,words like bug whose images might represent thedifferent senses of the word). On the other hand,

when the intra-cluster similarity is high, it indicatesthat there is a general consensus on the meaningof the word as represented by the images, whichmakes for an easier transfer of the word meaningvia images (i.e., better image translatability).

The metric we choose for the intra-cluster simi-larity of a word w is Median Max Cosine Similarity,which, given the set of images associated with theword, Iw, is:

MEDMAXw = mediani∈Iw

maxj∈Iw

{i 6= j : cosine(i, j)i = j : 0

This is a variation of the Average Maximum Co-sine Similarity in Bergsma and Van Durme (2011),using the median to reduce the effect of outliers.Additionally, note that the worst case of this metricgiving an undesirable outcome is when we have50 random pairs of image embeddings for a givenword cluster. This will result in a high intra-clustersimilarity despite the randomness of the overallcluster. However, in our findings this scenario is ex-tremely unlikely. As words have dominant senses,the effect of outliers is mitigated due to the use ofthe median.

However, intra-cluster similarity on its own isnot enough to indicate if the word in the target lan-guage wt is a good translation of the word in thesource language ws. For example, the word trainmay be represented with images of locomotives inone language and with images of people exercisingin another, if its meaning differs across languages.Both of the words’ images will have high intra-cluster similarity but low inter-cluster similarity,indicating poor translatability via images. Thus,intra-cluster similarity is only an effective measureof how good an image-aided translation is, on thecondition that the inter-cluster similarity is suffi-ciently high. In Section 4.4, we discuss how wecompute this threshold for inter-cluster similarityand in Section 4.6 how we combine intra- and inter-cluster similarity for image translatability.

4.3 ConcretenessTo study the relationship between image trans-latability and concreteness of a word, we adopta method similar to Hewitt et al. (2018) to train amodel to predict word concreteness. We use thedataset provided by Brysbaert et al. (2014), con-sisting of 40,000 words that have been assignedconcreteness scores by human judges, on a scaleof 1 to 5, from abstract to concrete. We split the

Figure 1: Distribution of concreteness scores predic-tions on the held-out validation set of 1,000 words fromBrysbaert et al. (2014). The Spearman correlation co-efficient calculated for ground-truth and predicted con-creteness scores is noted.

dataset into train and test sets, randomly picking39,000 words for training. Similar to Hewitt et al.(2018), our concreteness prediction model is a two-layer perceptron, with one 32-units hidden layer,and a ReLU activation function, trained with anL2 loss. For each word, the model input is theconcatenation of the single word embeddings ob-tained from the top four hidden layers of BERTDevlin et al. (2019), a practice recommended asthe best performing feature-extraction method bythe authors.

Figure 1 shows the results of our evaluation onthe test set of 1,000 words, depicting the distribu-tions of the different part-of-speech categories. Weprovide the Spearman correlation coefficient be-tween the ground-truth and predicted concretenessscores, which shows the improved effectivenessof our BERT embeddings-based method comparedto the Salle et al. (2016) embeddings employedby Hewitt et al. (2018). Using this trained model,we predict the concreteness score of each of thewords in our dataset by first translating the word toEnglish and lemmatizing it using spaCy.

Using the predicted concreteness scores, we dis-tinguish words in our dataset to concrete (having apredicted score of > 3) and abstract (≤ 3).

4.4 Inter-Cluster Similarity ThresholdWe define a homogeneity score threshold to de-termine if two words ws and wt have sufficientlyhigh overlap in their image embeddings to indi-cate a good translation. For each language pair, wecompute this threshold hthld

s,t by taking the average

homogeneity score of clusters of images of 10 ran-domly chosen word pairs from the source s andthe target language t. We take the average of thesescores since we want to first be able to comparethe threshold with other homogeneity scores andsecond to be able to capture the skew in negativethresholds as well. These pairs serve as negativeexamples of translation and we expect their imageembeddings to be disjoint. Hence, a word pair withhomogeneity score lower than this threshold meansthat the word pair has a good overlap in their imageembeddings (i.e., a high inter-cluster similarity),which indicates a good translation.

4.5 Intra-Cluster Similarity Threshold

Similarly, we define an intra-cluster similaritythreshold to determine if an image cluster asso-ciated with a word w is sufficiently tight. Sinceintra-cluster similarity is computed for each word(and not word pair), we compute this thresholdMEDMAXthld

l for each language l by constructinga negative example for the language i.e., an imagecluster with a high dispersion. We create this nega-tive example by taking five random words from thelanguage and for each word a random sample of 20images to build a cluster of 100 images (mimick-ing the typical image cluster size for a word in ourdataset). We set the Median Max Cosine Similarityof this image cluster as the intra-cluster similaritythreshold. A word that has an intra-cluster similar-ity higher than this threshold would mean that thisword has a tight image cluster, a consistent mean-ing as represented in its images’ representations.

4.6 Normalized Score

We define a normalized score NORMws ,wt to com-bine intra- and inter-cluster similarity scores for aword ws in the source language and its translationin the target language wt.

Given the intra-cluster similarity of word ws

(MEDMAXws) and that of word wt (MEDMAXwt);and the maximum and minimum intra-cluster simi-larities for the source language (MEDMAXmax

s andMEDMAXmin

s ), and those of the target language(MEDMAXmax

t and MEDMAXmint ); as well as the

homogeneity score of the words hws,wt and themaximum and minimum homogeneity scores ofwords in the language pair (hmax

s,t and hmins,t ), we

compute the normalized score NORMws ,wt as:

NORMws,wt = NORM MEDMAXws + NORM MEDMAXwt

− NORM hws,wt

where:

NORM MEDMAXws =MEDMAXws −MEDMAXmin

s

2(MEDMAXmaxs −MEDMAXmin

s )

NORM MEDMAXwt =MEDMAXwt −MEDMAXmin

t

2(MEDMAXmaxt −MEDMAXmin

t )

NORM hws,wt =hws,wt − hmin

s,t

hmaxs,t − hmin

s,t

For each language pair, we also define a thresh-old on this normalized score (i.e., NORMthld

s,t )by substituting MEDMAXws , MEDMAXwt , andhws,wt with MEDMAXthld

s , MEDMAXthldt , and hthld

s,t

respectively in the equation above.

5 Results

In order to compare the image translatability of twolanguage pairs with different characteristics, wecompare the ratio of the number of word pairs thatare good translations divided by the total numberof word pairs in each language pair:

RATIO =# of word pairs that are good translations

# of word pairs for the language pair

A word pair ws and wt has a good translation viaimages (or good image translatability) if its ho-mogeneity score hws,wt is lower than the homo-geneity threshold hthld

s,t and its Median Max CosineSimilarities i.e., MEDMAXws and MEDMAXwt

are higher than the thresholds MEDMAXthlds and

MEDMAXthldt , respectively.

The higher this ratio, the more translatable thelanguage pair is via images. When we compare twolanguage pairs that have the same source languagebut different target languages (with different sharedcharacteristics with the source), we can distinguishhow different characteristics such as cultural simi-larity or geographical proximity affect image trans-latability. In Table 2, we show image translatabilityratios of language pairs with the same source butdifferent target languages side-by-side.

To understand the role of concreteness in trans-lation via images, we also compute how many con-crete words have good translations according toour image translatability measures. We considera word pair, ws and wt, to be concrete if ws hasa concreteness score greater than 3. Source wordconcreteness is taken to as the pair concreteness,considering translation directionality. The ratio ofhow many concrete words in each language pairhave good translations is also shown in Table 2.

LanguagePair Number of Words Ratio

All Concrete All Concrete

az tr 4538 3470 0.31 0.37az ru 5380 2953 0.17 0.22

ko zh 338 214 0.18 0.22ko ja 748 499 0.69 0.72

zh ja 367 212 0.56 0.58zh ko 310 197 0.36 0.44

ja zh 212 137 0.39 0.44ja ko 741 488 0.67 0.70

ar ur 4916 3226 0.39 0.44ar fa 448 318 0.50 0.55ar he 2887 1874 0.69 0.73

ur ar 4243 2466 0.39 0.45ur hi 4588 2817 0.12 0.15

es fr 6392 3506 0.45 0.58es pt 7116 3920 0.40 0.53

fi hu 5615 3190 0.29 0.40fi no 5336 3033 0.17 0.26

af nl 5436 3247 0.39 0.50af sw 4553 2611 0.25 0.31

Table 2: Language pairs along with the numbers ofword pairs and their image translatability ratios, for alland concrete word pairs. In boldface we mark the pairthat has the highest ratio among pairs with the samesource language whose normalized scores are signifi-cantly different and whose ratio differences are high.

To test whether the difference in image translata-bility between language pairs that share the samesource language (e.g., Finnish to Norwegian vs.Finnish to Hungarian) is statistically significant,we conduct a simple t-test between their normal-ized score distributions. The resulting p-valuessignal the difference between their distributions.Low p-values (< 0.05) indicate statistical signifi-cance and high variation between the distributions,while higher values suggest low variation and largesimilarities between the language pairs (Table 3).

From the t-test, we find that the difference indistributions of pairs that share the same sourcelanguage is almost in all statistically significant(p-value < 0.05, Table 3) except for Japanese toChinese vs. Japanese to Korean.

Of other pairs whose normalized score differ-ences are statistically significant and whose differ-ences in translatability ratios are high (boldfaced,Table 2), we observe that the language pair withthe higher image translatability ratio (i.e., Azer-

Language Pair I Language Pair II p-value

az tr az ru 2.26× 10−25

ko zh ko ja 2.2× 10−4

zh ja zh ko 10.6× 10−5

ja zh ja ko 0.43ar ur ar fa 9.44× 10−29

ar fa ar he 2.39× 10−13

ar he ar ur 16.35× 10−5

es fr es pt 4.3× 10−47

fi hu fi no 2.66× 10−104

af nl af sw 10−100

Table 3: p-values of differences between normalizedscore distributions of language pairs that share the samesource language. In boldface, we mark pairs with ahigh p-value, for which we cannot assume a significantdifference in their normalized score distributions.

baijani to Turkish, Korean to Japanese, Chineseto Japanese, Arabic to Hebrew, Urdu to Arabic,Finnish to Hungarian, and Afrikaans to Dutch) isalways the pair that shares cultural similarity (i.e.,either similar language family, similar major eth-nicity, or similar major religion) even when theyhave little to no geographical proximity. For ex-ample, between Arabic and Hindi, Urdu’s wordsare more translatable via images to Arabic (whosespeakers share the same major religion as speakersof Urdu), despite Pakistan’s geographic proximityto India. Similarly, Finnish words are more trans-latable via images to Hungarian (whose speakersbelong to the same ethnolinguistic group as speak-ers of Finnish) than to Norwegian, despite Hungarynot sharing any land border with Finland. In ad-dition, there may be other language relatednessfactors that result in better image translatability be-tween languages, such as the similar writing systemof Chinese and Japanese, or the similar grammati-cal structure of Korean and Japanese; despite theirdifferent language families.

From Table 1 we can see that Spanish sharessimilar characteristics with both French and Por-tuguese. Similarly, Japanese shares similar at-tributes with both Chinese and Korean (matchingethnicity and religion, different geography and lan-guage family). In such cases, where the two lan-guage pairs do not differ in characteristics, we ob-serve that the difference in their translatability ra-tios is either small (in the case of Spanish to Frenchand Spanish to Portuguese ratios in Table 2) or in-significant (in the case of Japanese to Chinese andJapanese to Korean p-value in Table 3).

On the contrary, Korean to Japanese and Korean

Figure 2: (left) PCA plot of image embeddings of theword universe in Afrikaans (heelal) and in Dutch (uni-versum) showing their tightly overlapping image clus-ters, (center) images of heelal in Afrikaans, (right) im-ages of universum in Dutch.

to Chinese pairs, and Chinese to Korean and Chi-nese to Japanese pairs, have at least one differencein their attributes, accounting for the pairs’ resultsstatistical significance.

In addition, we observe that concreteness ofwords largely affects the quality of translations dueto the low diversity in its image representations,which facilitate translation between words. On av-erage, across language pairs, 62.4% of words withnormalized scores above the threshold are concrete,while only 37.6% are abstract. At the same time,in Table 2 we see that the translatability ratio isconsiderably higher for concrete word pairs thanall pairs. This supports our idea, and other previousworks, that concrete words are better representedvisually and, so, more likely to have good image-aided translations, compared to abstract ones.

6 Discussion

Our work has identified that language relatednessaffect word translations via images. We observethat languages with cultural relatedness have betterimage translatability; suggesting that cultural relat-edness should be taken into account when usingimages to aid translation. The image translatabilitymeasures we have defined can be used to identify apotentially good or poor translation or discover acultural similarity or disconnect between words intwo languages. For example, a word pair that has ahigh intra-cluster similarity and a high inter-clustersimilarity in their image representations indicatesthat the image clusters are tight and overlapping,signaling a good translation between them. For ex-ample, the word heelal in Afrikaans and the worduniversum in Dutch have tight and overlapping im-age clusters (i.e., low homogeneity score) as canbe seen in the PCA plot of their image embeddingsand in their images (Figure 2).

On the other hand, when a word pair has tightbut disjoint image clusters, it can mean that their

Figure 3: (left) PCA plot of image embeddings ofthe word dance in Afrikaans (dans) and in Swahili(kucheza) showing their tight but disjoint image clus-ters, (center) images of dans in Afrikaans, (right) im-ages of kucheza in Swahili.

images express different meanings of the word. Forexample, the word dance: dans in Afrikaans andthe word kucheza in Swahili have tight but disjointimage clusters (i.e., high homogeneity score) as canbe seen in the PCA plot of their image embeddingsand in the images (Figure 3), as kucheza meansboth to play and to dance in Swahili.

We observe that Afrikaans, for example, hasa higher image translatability to Dutch due totheir cultural (ethnolinguistic) similarity, than toSwahili, despite the relative distances of their cul-tural centers–South Africa is more distant geo-graphically to the Netherlands than to Tanzania,defined in Glottolog as being the cultural centerof the Swahili language. We observe higher vi-sual similarities between words that are transla-tions of each other in Afrikaans and Dutch than inAfrikaans and Swahili. For example, images of theword park in Afrikaans are more visually similar toimages of the word park in Dutch than to imagesof the word park in Swahili (hifadhi) (Figures 4,5). The images of park in Afrikaans and in Dutchrefer to a Western style park, while its images inSwahili refer more to a wildlife reservation, a cul-turally different representation of the word parkthat is potentially influenced by how speakers ofthe different languages interact with the conceptof the word. Interestingly, such connotation that isapparent in images may not be apparent in wordembeddings, since hifadhi is used similarly withpark in the texts of the language.

7 Conclusions

In this paper, we study when images may be usefulfor translating words between two languages fromthe perspective of their cultural and geographical re-latedness. We observe that translatability of wordsvia images vary in different language pairs, withlanguage pairs sharing cultural similarities havingbetter image translatability.

Figure 4: Images of the word park in (left) Afrikaans,(center) Dutch, (right) Swahili.

Figure 5: PCA plots for image embeddings of (left)park in Afrikaans to park in Dutch and (right) park inAfrikaans to hifadhi in Swahili.

In the future, it will be interesting to study im-age translatability of more language pairs and theircharacteristics, including those outside MMID, aswell as extend our work to sentence-level imageaided translation. It will also be of great value tostudy if adding considerations of cultural related-ness to image-aided MT can further improve itsperformance. Additionally, using a different metricfor intra-cluster similarity that does not calculatesimilarity with respect to the origin may be moreaccurate depending on the application. As manysimilarity functions, aside from cosine similarity,have been used in the computer vision literature,improving this function could be fruitful futurework.

Acknowledgments

We would like to thank the anonymous review-ers for their thoughtful comments. This work issupported in part by the U.S. NSF grant 1838193,DARPA HR001118S0044 (the LwLL program),and the Department of the Air Force FA8750-19-2-3334 (Semi-supervised Learning of MultimodalRepresentations). The U.S. Government is autho-rized to reproduce and distribute reprints for Gov-ernmental purposes. The views and conclusionscontained in this publication are those of the au-thors and should not be interpreted as representingofficial policies or endorsements of DARPA, theAir Force, and the U.S. Government.

ReferencesMikel Artetxe, Gorka Labaka, Eneko Agirre, and

Kyunghyun Cho. 2018. Unsupervised neural ma-chine translation. In International Conference onLearning Representations.

Loïc Barrault, Fethi Bougares, Lucia Specia, ChiraagLala, Desmond Elliott, and Stella Frank. 2018. Find-ings of the third shared task on multimodal machinetranslation. In Proceedings of the Third Conferenceon Machine Translation: Shared Task Papers, pages304–323, Belgium, Brussels. Association for Com-putational Linguistics.

Shane Bergsma and Benjamin Van Durme. 2011.Learning bilingual lexicons using the visual similar-ity of labeled web images. In IJCAI Proceedings-International Joint Conference on Artificial Intelli-gence, volume 22, page 1764. Citeseer.

Shane Bergsma, David Yarowsky, and Kenneth Church.2011. Using large monolingual and bilingual cor-pora to improve coordination disambiguation. InProceedings of the 49th Annual Meeting of the As-sociation for Computational Linguistics: HumanLanguage Technologies, pages 1346–1355, Portland,Oregon, USA. Association for Computational Lin-guistics.

Marc Brysbaert, Amy Beth Warriner, and Victor Ku-perman. 2014. Concreteness ratings for 40 thousandgenerally known english word lemmas. Behavior re-search methods, 46(3):904–911.

Andrea Burns, Donghyun Kim, Derry Wijaya, KateSaenko, and Bryan A Plummer. 2020. Learn-ing to scale multilingual representations for vision-language tasks. In European Conference on Com-puter Vision, pages 197–213. Springer.

Iacer Calixto, Qun Liu, and Nick Campbell. 2017. Mul-tilingual multi-modal embeddings for natural lan-guage processing. CoRR, abs/1702.01101.

Shizhe Chen, Qin Jin, and Alexander Hauptmann.2019. Unsupervised bilingual lexicon inductionfrom mono-lingual multimodal data. In Proceedingsof the AAAI Conference on Artificial Intelligence,volume 33, pages 8207–8214.

Raj Dabre, Tetsuji Nakagawa, and Hideto Kazawa.2017. An empirical study of language relatednessfor transfer learning in neural machine translation.In Proceedings of the 31st Pacific Asia Conferenceon Language, Information and Computation, pages282–286. The National University (Phillippines).

Johannes Dellert, Thora Daneyko, Alla Münch, AlinaLadygina, Armin Buch, Natalie Clarius, Ilja Grigor-jew, Mohamed Balabel, Hizniye Isabella Boga, Za-lina Baysarova, et al. 2020. Northeuralex: A wide-coverage lexical database of northern eurasia. Lan-guage resources and evaluation, 54(1):273–301.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2019. BERT: Pre-training ofdeep bidirectional transformers for language under-standing. In Proceedings of the 2019 Conferenceof the North American Chapter of the Associationfor Computational Linguistics: Human LanguageTechnologies, Volume 1 (Long and Short Papers),pages 4171–4186, Minneapolis, Minnesota. Associ-ation for Computational Linguistics.

Jon Andoni Duñabeitia, Davide Crepaldi, Antje SMeyer, Boris New, Christos Pliatsikas, Eva Smolka,and Marc Brysbaert. 2018. Multipic: A standard-ized set of 750 drawings with norms for six euro-pean languages. Quarterly Journal of ExperimentalPsychology, 71(4):808–816.

Desmond Elliott, Stella Frank, Loïc Barrault, FethiBougares, and Lucia Specia. 2017. Findings of thesecond shared task on multimodal machine transla-tion and multilingual image description. In Proceed-ings of the Second Conference on Machine Transla-tion, pages 215–233, Copenhagen, Denmark. Asso-ciation for Computational Linguistics.

Nicholas Evans and Stephen C. Levinson. 2009. Themyth of language universals: Language diversityand its importance for cognitive science. Behavioraland Brain Sciences, 32(5):429–448.

Pascale Fung and Lo Yuen Yee. 1998. An IR approachfor translating new words from nonparallel, compa-rable texts. In COLING 1998 Volume 1: The 17thInternational Conference on Computational Linguis-tics.

Spandana Gella, Rico Sennrich, Frank Keller, andMirella Lapata. 2017. Image pivoting for learningmultilingual multimodal representations. In Pro-ceedings of the 2017 Conference on Empirical Meth-ods in Natural Language Processing, pages 2839–2845, Copenhagen, Denmark. Association for Com-putational Linguistics.

Harald Hammarström, Robert Forkel, Martin Haspel-math, and Sebastian Bank. 2020. Glottolog 4.3.Jena.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun. 2015. Deep residual learning for image recog-nition.

Nathan Keirns Heather Griffiths. 2015. Introduc-tion to sociology 2e. In Introduction to Sociol-ogy 2e/Culture/elements-of-culture, chapter 3. Open-Stax, Oxford.

John Hewitt, Daphne Ippolito, Brendan Callahan, RenoKriz, Derry Tanti Wijaya, and Chris Callison-Burch.2018. Learning translations via images with a mas-sively multilingual image dataset. In Proceedingsof the 56th Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers),pages 2566–2576, Melbourne, Australia. Associa-tion for Computational Linguistics.

https://openreview.net/forum?id=Sy2ogebAW

https://openreview.net/forum?id=Sy2ogebAW

https://doi.org/10.18653/v1/W18-6402

https://doi.org/10.18653/v1/W18-6402

https://doi.org/10.18653/v1/W18-6402

https://dl.acm.org/doi/10.5555/2283696.2283698

https://dl.acm.org/doi/10.5555/2283696.2283698

https://www.aclweb.org/anthology/P11-1135

https://www.aclweb.org/anthology/P11-1135

https://doi.org/10.3758/s13428-013-0403-5

https://doi.org/10.3758/s13428-013-0403-5

http://arxiv.org/abs/1702.01101



https://doi.org/10.1609/aaai.v33i01.33018207

https://doi.org/10.1609/aaai.v33i01.33018207

https://www.aclweb.org/anthology/Y17-1038

https://www.aclweb.org/anthology/Y17-1038

https://doi.org/10.1007/s10579-019-09480-6

https://doi.org/10.1007/s10579-019-09480-6

https://doi.org/10.18653/v1/N19-1423

https://doi.org/10.18653/v1/N19-1423

https://doi.org/10.18653/v1/N19-1423

https://doi.org/10.1080/17470218.2017.1310261

https://doi.org/10.1080/17470218.2017.1310261

https://doi.org/10.1080/17470218.2017.1310261

https://doi.org/10.18653/v1/W17-4718

https://doi.org/10.18653/v1/W17-4718

https://doi.org/10.18653/v1/W17-4718

https://doi.org/10.1017/S0140525X0999094X

https://doi.org/10.1017/S0140525X0999094X

https://doi.org/10.1017/S0140525X0999094X

https://www.aclweb.org/anthology/C98-1066



https://doi.org/10.18653/v1/D17-1303

https://doi.org/10.18653/v1/D17-1303

https://doi.org/10.5281/zenodo.4061162



https://openstax.org/details/books/introduction-sociology-2e

https://openstax.org/details/books/introduction-sociology-2e

https://doi.org/10.18653/v1/P18-1239

https://doi.org/10.18653/v1/P18-1239

Elenaz Janfaza, A Assemi, and SS Dehghan. 2012.Language, translation, and culture. In InternationalConference on Language, Medias and Culture, vol-ume 33, pages 83–87.

Pratik Joshi, Sebastin Santy, Amar Budhiraja, KalikaBali, and Monojit Choudhury. 2020. The state andfate of linguistic diversity and inclusion in the NLPworld. In Proceedings of the 58th Annual Meet-ing of the Association for Computational Linguistics,pages 6282–6293, Online. Association for Computa-tional Linguistics.

A. Karpathy and L. Fei-Fei. 2017. Deep visual-semantic alignments for generating image descrip-tions. IEEE Transactions on Pattern Analysis andMachine Intelligence, 39(4):664–676.

Marc A Kastner, Ichiro Ide, Frank Nack, YasutomoKawanishi, Takatsugu Hirayama, Daisuke Deguchi,and Hiroshi Murase. 2020. Estimating the image-ability of words by mining visual characteristicsfrom crawled image data. Multimedia Tools and Ap-plications, pages 1–33.

Douwe Kiela, Felix Hill, Anna Korhonen, and StephenClark. 2014. Improving multi-modal representa-tions using image dispersion: Why less is sometimesmore. In Proceedings of the 52nd Annual Meeting ofthe Association for Computational Linguistics (Vol-ume 2: Short Papers), pages 835–841, Baltimore,Maryland. Association for Computational Linguis-tics.

Douwe Kiela, Ivan Vulic, and Stephen Clark. 2015.Visual bilingual lexicon induction with transferredConvNet features. In Proceedings of the 2015 Con-ference on Empirical Methods in Natural LanguageProcessing, pages 148–158, Lisbon, Portugal. Asso-ciation for Computational Linguistics.

Donghyun Kim, Kuniaki Saito, Kate Saenko, StanSclaroff, and Bryan Plummer. 2020. Mule: Multi-modal universal language embedding. In Proceed-ings of the AAAI Conference on Artificial Intelli-gence, volume 34, pages 11254–11261.

Tom Kocmi and Ondrej Bojar. 2018. Trivial transferlearning for low-resource neural machine translation.In Proceedings of the Third Conference on MachineTranslation: Research Papers, pages 244–252, Brus-sels, Belgium. Association for Computational Lin-guistics.

Tuomo Korenius, Jorma Laurikkala, and Martti Juhola.2007. On principal component analysis, cosine andeuclidean measures in information retrieval. Infor-mation Sciences, 177(22):4893–4905.

Guillaume Lample, Alexis Conneau, Ludovic Denoyer,and Marc’Aurelio Ranzato. 2018a. Unsupervisedmachine translation using monolingual corpora only.In International Conference on Learning Represen-tations.

Guillaume Lample, Alexis Conneau, Marc’AurelioRanzato, Ludovic Denoyer, and Hervé Jégou. 2018b.Word translation without parallel data. In Interna-tional Conference on Learning Representations.

Guillaume Lample, Myle Ott, Alexis Conneau, Lu-dovic Denoyer, and Marc’Aurelio Ranzato. 2018c.Phrase-based & neural unsupervised machine trans-lation. In Proceedings of the 2018 Conference onEmpirical Methods in Natural Language Processing,pages 5039–5049, Brussels, Belgium. Associationfor Computational Linguistics.

Toan Q. Nguyen and David Chiang. 2017. Trans-fer learning across low-resource, related languagesfor neural machine translation. In Proceedings ofthe Eighth International Joint Conference on Natu-ral Language Processing (Volume 2: Short Papers),pages 296–301, Taipei, Taiwan. Asian Federation ofNatural Language Processing.

Eugene Nida. 1945. Linguistics and ethnology intranslation-problems. WORD, 1(2):194–208.

Iroro Orife, Julia Kreutzer, Blessing Sibanda, DanielWhitenack, Kathleen Siminyu, Laura Martinus,Jamiil Toure Ali, Jade Abbott, Vukosi Marivate, Sa-lomon Kabongo, et al. Masakhane–machine transla-tion for africa. arXiv preprint arXiv:2003.11529.

Allan Paivio, John C Yuille, and Stephen A Madigan.1968. Concreteness, imagery, and meaningfulnessvalues for 925 nouns. Journal of experimental psy-chology, 76(1p2):1.

Ellie Pavlick, Matt Post, Ann Irvine, Dmitry Kachaev,and Chris Callison-Burch. 2014. The language de-mographics of Amazon Mechanical Turk. Transac-tions of the Association for Computational Linguis-tics, 2:79–92.

Reinhard Rapp. 1999. Automatic identification ofword translations from unrelated English and Ger-man corpora. In Proceedings of the 37th AnnualMeeting of the Association for Computational Lin-guistics, pages 519–526, College Park, Maryland,USA. Association for Computational Linguistics.

Andrew Rosenberg and Julia Hirschberg. 2007. V-measure: A conditional entropy-based external clus-ter evaluation measure. In Proceedings of the 2007Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL), pages 410–420, Prague, Czech Republic. Association for Com-putational Linguistics.

Guy Rotman, Ivan Vulic, and Roi Reichart. 2018.Bridging languages through images with deep par-tial canonical correlation analysis. In Proceedingsof the 56th Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers),pages 910–921, Melbourne, Australia. Associationfor Computational Linguistics.

https://old.amu.ac.in/emp/studym/99994892.pdf

https://doi.org/10.18653/v1/2020.acl-main.560



https://doi.org/10.1109/TPAMI.2016.2598339



https://doi.org/10.1007/s11042-019-08571-4

https://doi.org/10.1007/s11042-019-08571-4

https://doi.org/10.1007/s11042-019-08571-4

https://doi.org/10.3115/v1/P14-2135

https://doi.org/10.3115/v1/P14-2135

https://doi.org/10.3115/v1/P14-2135

https://doi.org/10.18653/v1/D15-1015

https://doi.org/10.18653/v1/D15-1015

https://doi.org/10.18653/v1/W18-6325

https://doi.org/10.18653/v1/W18-6325

https://doi.org/https://doi.org/10.1016/j.ins.2007.05.027

https://doi.org/https://doi.org/10.1016/j.ins.2007.05.027

https://openreview.net/forum?id=rkYTTf-AZ

https://openreview.net/forum?id=rkYTTf-AZ

https://openreview.net/forum?id=H196sainb

https://doi.org/10.18653/v1/D18-1549

https://doi.org/10.18653/v1/D18-1549

https://www.aclweb.org/anthology/I17-2050



https://doi.org/10.1080/00437956.1945.11659254

https://doi.org/10.1080/00437956.1945.11659254

https://arxiv.org/pdf/2003.11529.pdf

https://arxiv.org/pdf/2003.11529.pdf

https://doi.org/10.1037/h0025327

https://doi.org/10.1037/h0025327

https://doi.org/10.1162/tacl_a_00167

https://doi.org/10.1162/tacl_a_00167

https://doi.org/10.3115/1034678.1034756

https://doi.org/10.3115/1034678.1034756

https://doi.org/10.3115/1034678.1034756

https://www.aclweb.org/anthology/D07-1043



https://doi.org/10.18653/v1/P18-1084

https://doi.org/10.18653/v1/P18-1084

Alexandre Salle, Aline Villavicencio, and Marco Idiart.2016. Matrix factorization using window samplingand negative sampling for improved word represen-tations. In Proceedings of the 54th Annual Meet-ing of the Association for Computational Linguistics(Volume 2: Short Papers), pages 419–424, Berlin,Germany. Association for Computational Linguis-tics.

Karan Singhal, Karthik Raman, and Balder ten Cate.2019. Learning multilingual word embeddings us-ing image-text data. In Proceedings of the SecondWorkshop on Shortcomings in Vision and Language,pages 68–77, Minneapolis, Minnesota. Associationfor Computational Linguistics.

Lucia Specia, Stella Frank, Khalil Sima’an, andDesmond Elliott. 2016. A shared task on multi-modal machine translation and crosslingual imagedescription. In Proceedings of the First Conferenceon Machine Translation: Volume 2, Shared Task Pa-pers, pages 543–553, Berlin, Germany. Associationfor Computational Linguistics.

Bill Thompson, Seán G Roberts, and Gary Lupyan.2020. Cultural influences on word meanings re-vealed through large-scale semantic alignment. Na-ture Human Behaviour, pages 1–10.

Ivan Vulic, Douwe Kiela, Stephen Clark, and Marie-Francine Moens. 2016. Multi-modal representationsfor improved bilingual lexicon learning. In Proceed-ings of the 54th Annual Meeting of the Associationfor Computational Linguistics (Volume 2: Short Pa-pers), pages 188–194, Berlin, Germany. Associationfor Computational Linguistics.

Judy Wakabayashi. 1991. Translation between un-related languages and cultures, as illustrated byjapanese-english translation. Meta: journal des tra-ducteurs/Meta: Translators’ Journal, 36(2-3):414–423.

Jonatas Wehrmann, Douglas M Souza, Mauricio ALopes, and Rodrigo C Barros. 2019. Language-agnostic visual-semantic embeddings. In Proceed-ings of the IEEE/CVF International Conference onComputer Vision, pages 5804–5813.

Derry Tanti Wijaya, Brendan Callahan, John Hewitt,Jie Gao, Xiao Ling, Marianna Apidianaki, and ChrisCallison-Burch. 2017. Learning translations via ma-trix completion. In Proceedings of the 2017 Con-ference on Empirical Methods in Natural LanguageProcessing, pages 1452–1463.

Barret Zoph, Deniz Yuret, Jonathan May, and KevinKnight. 2016. Transfer learning for low-resourceneural machine translation. In Proceedings of the2016 Conference on Empirical Methods in Natu-ral Language Processing, pages 1568–1575, Austin,Texas. Association for Computational Linguistics.

https://doi.org/10.18653/v1/P16-2068

https://doi.org/10.18653/v1/P16-2068

https://doi.org/10.18653/v1/P16-2068

https://doi.org/10.18653/v1/W19-1807

https://doi.org/10.18653/v1/W19-1807

https://doi.org/10.18653/v1/W16-2346

https://doi.org/10.18653/v1/W16-2346

https://doi.org/10.18653/v1/W16-2346

https://doi.org/10.1038/s41562-020-0924-8

https://doi.org/10.1038/s41562-020-0924-8

https://doi.org/10.18653/v1/P16-2031

https://doi.org/10.18653/v1/P16-2031

https://id.erudit.org/iderudit/004585ar



https://doi.org/10.18653/v1/D16-1163

https://doi.org/10.18653/v1/D16-1163

cultural and geographical inﬂuences on image

Documents