including signed languages in natural language processing

Proceedings of the 59th Annual Meeting of the Association for Computational Linguisticsand the 11th International Joint Conference on Natural Language Processing, pages 7347–7360

August 1–6, 2021. ©2021 Association for Computational Linguistics

7347

Including Signed Languages in Natural Language Processing

Kayo Yin1, Amit Moryossef2, Julie Hochgesang3, Yoav Goldberg2,4, Malihe Alikhani51Language Technologies Institute, Carnegie Mellon University

2Bar-Ilan University3Department of Linguistics, Gallaudet University

4Allen Institute for AI5School of Computing and Information, University of [email protected], [email protected]

[email protected], [email protected], [email protected]

Abstract

Signed languages are the primary means ofcommunication for many deaf and hard ofhearing individuals. Since signed languagesexhibit all the fundamental linguistic proper-ties of natural language, we believe that toolsand theories of Natural Language Processing(NLP) are crucial towards its modeling. How-ever, existing research in Sign Language Pro-cessing (SLP) seldom attempt to explore andleverage the linguistic organization of signedlanguages. This position paper calls on theNLP community to include signed languagesas a research area with high social and sci-entific impact. We first discuss the linguisticproperties of signed languages to consider dur-ing their modeling. Then, we review the limi-tations of current SLP models and identify theopen challenges to extend NLP to signed lan-guages. Finally, we urge (1) the adoption ofan efficient tokenization method; (2) the devel-opment of linguistically-informed models; (3)the collection of real-world signed languagedata; (4) the inclusion of local signed languagecommunities as an active and leading voice inthe direction of research.

1 Introduction

Natural Language Processing (NLP) has revolu-tionized the way people interact with technologythrough the rise of personal assistants and machinetranslation systems, to name a few. However, thevast majority of NLP models require a spoken lan-guage input (speech or text), thereby excludingaround 200 different signed languages and up to 70million deaf people1 from modern language tech-nologies.

1According to World Federation of the Deafhttps://wfdeaf.org/our-work/

20002002

20042006

20082010

20122014

20162018

2020

Year

0

100

200

300

400

# of

pub

licat

ions

All Computer ScienceACL Anthology

Figure 1: Evolution of the number of publications refer-ring to sign language in its title from computer sciencevenues and in the ACL anthology. Publications in com-puter science are extracted from the Semantic Scholararchive (Ammar et al., 2018).

Throughout history, Deaf communities foughtfor the right to learn and use signed languages, aswell as for the recognition of signed languages as le-gitimate languages (§2). Indeed, signed languagesare sophisticated communication modalities thatare at least as capable as spoken languages in allmanners, linguistic and social. However, in a pre-dominantly oral society, deaf people are constantlyencouraged to use spoken languages through lip-reading or text-based communication. The exclu-sion of signed languages from modern languagetechnologies further suppresses signing in favor ofspoken languages. This disregards the preferencesof the Deaf communities who strongly prefer tocommunicate in signed languages both online andfor in-person day-to-day interactions, among them-selves and when interacting with spoken languagecommunities (Padden and Humphries, 1988; Glick-man and Hall, 2018). Thus, it is essential to make

https://wfdeaf.org/our-work/

7348

signed languages accessible.

To date, a large amount of research on Sign Lan-guage Processing (SLP) has been focused on thevisual aspect of signed languages, led by the Com-puter Vision (CV) community, with little NLP in-volvement (Figure 1). This is not unreasonable,given that a decade ago, we lacked the adequateCV tools to process videos for further linguisticanalyses. However, like spoken languages, signedlanguages are fully-fledged systems that exhibitall the fundamental characteristics of natural lan-guages (§3), and current SLP techniques fail to ad-dress or leverage the linguistic structure of signedlanguages (§4). This leads us to believe that NLPtools and theories are crucial to process signedlanguages. Given the recent advances in CV, thisposition paper argues that now is the time to in-corporate linguistic insight into signed languagemodeling.

Signed languages introduce novel challenges forNLP due to their visual-gestural modality, simul-taneity, spatial coherence, and lack of written form.By working on signed languages, the communitywill gain a more holistic perspective on naturallanguages through a better understanding of howmeaning is conveyed by the visual modality andhow language is grounded in visuospatial concepts.

Moreover, SLP is not only an intellectually ap-pealing area but also an important research areawith a strong potential to benefit signing communi-ties. Examples of beneficial applications enabledby signed language technologies include better doc-umentation of endangered sign languages; educa-tional tools for sign language learners; tools forquery and retrieval of information from signedlanguage videos; personal assistants that react tosigned languages; real-time automatic sign lan-guage interpretations, and more. Needless to say,in addressing this research area, researchers shouldwork alongside and under the direction of deafcommunities, and to the benefit of the signing com-munities’ interest above all (Harris et al., 2009).

After identifying the challenges and open prob-lems to successfully include signed languages inNLP (§5), we emphasize the need to: (1) developa standardized tokenization method of signed lan-guages with minimal information loss for its mod-eling; (2) extend core NLP technologies to signedlanguages to create linguistically-informed mod-els; (3) collect signed language data of sufficientsize that accurately represents the real world; (4)

involve and collaborate with the Deaf communitiesat every step of research.

2 Background and Related Work

2.1 History of Signed Languages and DeafCulture

Over the course of modern history, spoken lan-guages were dominant so much so that signed lan-guages struggled to be recognized as languagesin their own right and educators developed mis-conceptions that signed language acquisition mayhinder the development of speech skills. For ex-ample, in 1880, a large international conferenceof deaf educators called the Second InternationalCongress on Education of the Deaf banned teach-ing signed languages, favoring speech therapy in-stead. It was not until the seminal work on Amer-ican Sign Language (ASL) by Stokoe (1960) thatsigned languages started gaining recognition asnatural, independent, and well-defined languages,which then inspired other researchers to furtherexplore signed languages as a research area. Never-theless, antiquated notions that deprioritized signedlanguages continue to do harm and subjects manyto linguistic neglect (Humphries et al., 2016). Sev-eral studies have shown that deaf children raisedsolely with spoken languages do not gain enoughaccess to a first language during their critical pe-riod of language acquisition (Murray et al., 2020).This language deprivation can lead to life-long con-sequences on the cognitive, linguistic, socioemo-tional, and academic development of the deaf (Hallet al., 2017).

Signed languages are the primary languages ofcommunication for the Deaf2 and are at the heart ofDeaf communities. Failing to recognize signed lan-guages as fully-fledged natural language systemsin their own right has had harmful effects in thepast, and in an increasingly digitized world, theNLP community has an important responsibilityto include signed languages in its research. NLPresearch should strive to enable a world in whichall people, including the Deaf, have access to lan-guages that fit their lived experience.

2.2 Sign Language Processing in theLiterature

Jaffe (1994); Ong and Ranganath (2005); Parton

2When capitalized, “Deaf” refers to a community of deafpeople who share a language and a culture, whereas the lower-case “deaf” refers to the audiological condition of not hearing.

7349

(2006) survey early works in SLP that were mostlylimited to using sensors to capture fingerspellingand isolated signs, or use rules to synthesize signsfrom spoken language text, due to the lack of ade-quate CV technology at the time to process videos.This paper will instead focus on more recent vision-based and data-driven approaches that are non-intrusive and more powerful. The introduction ofa continuous signed language benchmark dataset(Forster et al., 2014; Cihan Camgoz et al., 2018),coupled with the advent of deep learning for visualprocessing, lead to increased efforts to recognizesigned expressions from videos. Recent surveyson SLP mostly review these different approachesfor sign language recognition developed by the CVcommunity (Koller, 2020; Rastgoo et al., 2020;Adaloglou et al., 2020).

Meanwhile, signed languages have remained rel-atively overlooked in NLP literature (Figure 1).Bragg et al. (2019) argue the importance of an in-terdisciplinary approach to SLP, raising the impor-tance of NLP involvement among other disciplines.We take this argument further by diving into the lin-guistic modeling challenges for signed languagesand providing a roadmap of open questions to beaddressed by the NLP community, in hopes of stim-ulating efforts from an NLP perspective towardsresearch on signed languages.

3 Sign Language Lingusitics

Signed languages consist of phonological, morpho-logical, syntactic, and semantic levels of structurethat fulfill the same social, cognitive, and com-municative purposes as other natural languages.While spoken languages primarily channel the oral-auditory modality, signed languages use the visual-gestural modality, relying on the face, hands, bodyof the signer, and the space around them to createdistinctions in meaning. We present the linguisticfeatures of signed languages3 that must be takeninto account during their modeling.

Phonology Signs are composed of minimal unitsthat combine manual features such as hand config-uration, palm orientation, placement, contact, pathmovement, local movement, as well as non-manualfeatures including eye aperture, head movement,and torso positioning4 (Liddell and Johnson, 1989;

3We mainly refer to ASL, where most sign language re-search has been conducted, but not exclusively.

4In this work, we focus on visual signed languages ratherthan tactile systems such as Pro-Tactile ASL which DeafBlind

Johnson and Liddell, 2011; Brentari, 2011; Sandler,2012). In both signed and spoken languages, not allpossible phonemes are realized, and inventories oftwo languages’ phonemes/features may not overlapcompletely. Different languages are also subject torules for the allowed combinations of features.

Simultaneity Though an ASL sign takes abouttwice as long to produce than an English word,the rates of transmission of information betweenthe two languages are similar (Bellugi and Fischer,1972). One way signed languages compensate forthe slower production rate of signs is through si-multaneity: signed languages make use of multiplevisual cues to convey different information simul-taneously(Sandler, 2012). For example, the signermay produce the sign for ’cup’ on one hand whilesimultaneously pointing to the actual cup with theother to express “that cup”. Similarly to tone inspoken languages, the face and torso can convey ad-ditional affective information (Liddell et al., 2003;Johnston and Schembri, 2007). Facial expressionscan modify adjectives, adverbs, and verbs; a headshake can negate a phrase or sentence; eye directioncan help indicate referents.

Referencing The signer can introduce referentsin discourse either by pointing to their actual loca-tions in space, or by assigning a region in the sign-ing space to a non-present referent and by pointingto this region to refer to it (Rathmann and Mathur,2011; Schembri et al., 2018). Signers can alsoestablish relations between referents grounded insigning space by using directional signs or em-bodying the referents using body shift or eye gaze(Dudis, 2004; Liddell and Metzger, 1998). Spatialreferencing also impacts morphology when the di-rectionality of a verb depends on the location of thereference to its subject and/or object (de Beuzeville,2008; Fenlon et al., 2018): for example, a direc-tional verb can move from the location of its sub-ject and ending at the location of its object. Whilethe relation between referents and verbs in spo-ken language is more arbitrary, referent relationsare usually grounded in signed languages. The vi-sual space is heavily exploited to make referencingclear.

Another way anaphoric entities are referencedin sign language is by using classifiers or depictingsigns (Supalla, 1986; Wilcox and Hafer, 2004; Roy,2011) that help describe the characteristics of the

Americans sometimes prefer.

7350

referent. Classifiers are typically one-handed signsthat do not have a particular location or movementassigned to them, or derive features from mean-ingful discourse (Liddell et al., 2003), so they canbe used to convey how the referent relates to otherentities, describe its movement, and give more de-tails. For example, to tell about a car swerving andcrashing, one might use the hand classifier for avehicle, move it to indicate swerving, and crash itwith another entity in space.

To quote someone other than oneself, signers per-form role shift (Cormier et al., 2015), where theymay physically shift in space to mark the distinc-tion, and take on some characteristics of the peoplethey are representing. For example, to recount adialogue between a taller and a shorter person, thesigner may shift to one side and look up when tak-ing the shorter person’s role, shift to the other sideand look down when taking the taller person’s role.

Fingerspelling Fingerspelling is a result of lan-guage contact between a signed language and asurrounding spoken language written form (Bat-tison, 1978; Wilcox, 1992; Brentari and Padden,2001; Patrie and Johnson, 2011). A set of manualgestures correspond with a written orthography orphonetic system. Fingerspelling is often used toindicate names or places or new concepts from thespoken language but often have become integratedinto the signed languages themselves as anotherlinguistic strategy (Padden, 1998; Montemurro andBrentari, 2018).

4 Current State of SLP

In this section, we present the existing methods,resources, and tasks in SLP, and discuss their limi-tations to lay the ground for future research.

4.1 Representations of Signed Languages

Representation is a significant challenge for SLP,as unlike spoken languages, signed languages haveno widely adopted written form. Figure 2 illus-trates each signed language representation we willdescribe below.

Videos are the most straightforward representa-tion of a signed language and can amply incorpo-rate the information conveyed through sign. Onemajor drawback of using videos is their high dimen-sionality: they usually include more informationthan needed for modeling, and are expensive tostore, transmit, and encode. As facial features are

essential in sign, anonymizing raw videos also re-mains an open problem, limiting the possibilityof making these videos publicly available (Isard,2020).

Poses reduce the visual cues from videos toskeleton-like wireframe or mesh representing thelocation of joints. While motion capture equip-ment can often provide better quality pose estima-tion, it is expensive and intrusive, and estimatingpose from videos is the preferred method currently(Pishchulin et al., 2012; Chen et al., 2017; Caoet al., 2019; Guler et al., 2018). Compared to videorepresentations, accurate poses are lower in com-plexity and anonymized, while observing relativelylow information loss. However, they remain a con-tinuous, multidimensional representation that is notadapted to most NLP models.

Written notation systems represent signs as dis-crete visual features. Some systems are writtenlinearly and others use graphemes in two dimen-sions. While various universal (Sutton, 1990; Prill-witz and Zienert, 1990) and language-specific no-tation systems (Stokoe Jr, 2005; Kakumasu, 1968;Bergman, 1979) have been proposed, no writingsystem has been adopted widely by any sign lan-guage community, and the lack of standard hindersthe exchange and unification of resources and ap-plications between projects. Figure 2 depicts twouniversal notation systems: SignWriting (Sutton,1990), a two-dimensional pictographic system, andHamNoSys (Prillwitz and Zienert, 1990), a linearstream of graphemes that was designed to be read-able by machines.

Glossing is the transcription of signed languagessign-by-sign, where every sign has a unique identi-fier. While various sign language corpus projectshave provided gloss annotation guidelines (Meschand Wallin, 2015; Johnston and De Beuzeville,2016; Konrad et al., 2018), again, there is no sin-gle agreed-upon standard. Linear gloss annotationsare also an imprecise representation of signed lan-guage: they do not adequately capture all infor-mation expressed simultaneously through differentcues (i.e. body posture, eye gaze) or spatial re-lations, which leads to an inevitable informationloss up to a semantic level that affects downstreamperformance on SLP tasks (Yin and Read, 2020b).

7351

YOUR

HamNoSys

umbrellaSignWriting

ASL Gloss NAME

WHAT

Pose Stream

Video Stream

107 FRAMES

Figure 2: Representations of an American Sign Language phrase with video frames, pose estimations, SignWriting,HamNoSys and glosses. English translation: “What is your name?”

4.2 Existing Sign Language Resources

Now, we introduce the different formats of re-sources and discuss how they can be used forsigned language modeling.

Bilingual dictionaries for signed language(Mesch and Wallin, 2012; Fenlon et al., 2015;Crasborn et al., 2016; Gutierrez-Sigut et al., 2016)map a spoken language word or short phrase toa signed language video. One notable dictionaryis, SpreadTheSign5 is a parallel dictionary con-taining around 23,000 words with up to 41 differ-ent spoken-signed language pairs and more than500,000 videos in total. While dictionaries mayhelp create lexical rules between languages, theydo not demonstrate the grammar or the usage ofsigns in context.

Fingerspelling corpora usually consist ofvideos of words borrowed from spoken languagesthat are signed letter-by-letter. They can besynthetically created (Dreuw et al., 2006) or minedfrom online resources (Shi et al., 2018, 2019).However, they only capture one aspect of signedlanguages.

Isolated sign corpora are collections of anno-tated single signs. They are synthesized (Eblinget al., 2018; Huang et al., 2018; Sincan and Keles,2020; Hassan et al., 2020) or mined from onlineresources (Vaezi Joze and Koller, 2019; Li et al.,2020), and can be used for isolated sign languagerecognition or for contrastive analysis of minimalsigning pairs (Imashev et al., 2020). However, likedictionaries, they do not describe relations between

5https://www.spreadthesign.com/

signs nor do they capture coarticulation during sign-ing, and are often limited in vocabulary size (20-1000 signs)

Continuous sign corpora contain parallel se-quences of signs and spoken language. Availablecontinuous sign corpora are extremely limited, con-taining 4-6 orders of magnitude fewer sentencepairs than similar corpora for spoken language ma-chine translation (Arivazhagan et al., 2019). More-over, while automatic speech recognition (ASR)datasets contain up to 50,000 hours of recordings(Pratap et al., 2020), the largest continuous sign lan-guage corpus contain only 1,150 hours, and only 50of them are publicly available (Hanke et al., 2020).These datasets are usually synthesized (Databases,2007; Crasborn and Zwitserlood, 2008; Ko et al.,2019; Hanke et al., 2020) or recorded in studio con-ditions (Forster et al., 2014; Cihan Camgoz et al.,2018), which does not account for noise in real-lifeconditions. Moreover, some contain signed inter-pretations of spoken language rather than naturally-produced signs, which may not accurately repre-sent native signing since translation is now a partof the discourse event.

Availability Unlike the vast amount and diversityof available spoken language resources that allowvarious applications, signed language resources arescarce and currently only support translation andproduction. Unfortunately, most of the signed lan-guage corpora discussed in the literature are eithernot available for use or available under heavy re-strictions and licensing terms. Signed languagedata is especially challenging to anonymize due tothe importance of facial and other physical features

https://www.spreadthesign.com/

7352

in signing videos, limiting its open distribution, anddeveloping anonymization with minimal informa-tion loss, or accurate anonymous representations isa promising research problem.

4.3 Sign Language Processing Tasks

The CV community has mainly led the researchon SLP so far to focus on processing the visualfeatures in signed language videos. As a result,current SLP methods do not fully address the lin-guistic complexity of signed languages. We surveycommon SLP tasks and limitations of current meth-ods by drawing on linguistic theories of signedlanguages.

Detection Sign language detection is the binaryclassification task to determine whether a signedlanguage is being used or not in a given video frame.While recent detection models (Borg and Camilleri,2019; Moryossef et al., 2020) achieve high perfor-mance, we lack well-annotated data that includeinterference and distractions with non-signing in-stances for proper evaluation. A similar task inspoken languages is voice activity detection (VAD)(Sohn et al., 1999; Ramırez et al., 2004), the de-tection of when a human voice is used in an au-dio signal. However, as VAD methods often relyon speech-specific representations such as spectro-grams, they are not always applicable to videos.

Identification Sign language identification clas-sifies which signed language is being used in agiven video automatically. Existing works utilizethe distribution of phonemes (Gebre et al., 2013)or activity maps in signing space (Monteiro et al.,2016) to identify the signed language in videos.However, these methods only rely on low-levelvisual features, while signed languages have sev-eral distinctive features on a linguistic level, suchas lexical or structural differences (McKee andKennedy, 2000; Kimmelman, 2014; Ferreira-Brito,1984; Shroyer and Shroyer, 1984) which have notbeen explored for this task.

Segmentation Segmentation consists of detect-ing the frame boundaries for signs or phrases invideos to divide them into meaningful units. Cur-rent methods resort to segmenting units looselymapped to signed language units (Santemiz et al.,2009; Farag and Brock, 2019; Bull et al., 2020),and does not leverage reliable linguistic predictorsof sentence boundaries such as prosody in signedlanguages (i.e. pauses, sign duration, facial expres-

sions, eye apertures) (Sandler, 2010; Ormel andCrasborn, 2012).

Recognition Sign language recognition (SLR)detects and label signs from a video, either onisolated (Imashev et al., 2020; Sincan and Keles,2020) or continuous (Cui et al., 2017; Camgozet al., 2018, 2020b) signs. Though some previousworks have referred to this as “sign language trans-lation”, recognition merely determines the associ-ated label of each sign, without handling the syntaxand morphology of the signed language (Padden,1988) to create a spoken language output. Instead,SLR has often been used as an intermediate stepduring translation to produce glosses from signedlanguage videos.

Translation Sign language translation (SLT)commonly refers to the translation of signed lan-guage to spoken language. Current methods eitherperform translation with glosses (Camgoz et al.,2018, 2020b; Yin and Read, 2020a,b; Moryossefet al., 2021) or on pose estimations and sign ar-ticulators from videos (Ko et al., 2019; Camgozet al., 2020a), but do not, for instance, handle spa-tial relations and grounding in discourse to resolveambiguous referents.

Production Sign language production consistsof producing signed language from spoken lan-guage and often use poses as an intermediate rep-resentation to overcome challenges in animation.To overcome the challenges in generating videosdirectly, most efforts use poses as an intermedi-ate representation, with the goal of either usingcomputer animation or pose-to-video models toperform video production. Earlier methods gen-erate and concatenate isolated signs (Stoll et al.,2018, 2020), while more recent methods (Saun-ders et al., 2020b,a; Zelinka and Kanis, 2020; Xiaoet al., 2020) autoregressively decode a sequence ofposes from an input text. Due to the lack of suitableautomatic evaluation methods of generated signs,existing works resort to measuring back-translationquality, which cannot accurately capture the qualityof the produced signs nor its usability in real-worldsettings. A better understanding of how distinc-tions in meaning are created in signed languagemay help develop a better evaluation method.

7353

5 Towards Including Signed Languagesin Natural Language Processing

The limitations in the design of current SLP modelsoften stem from the lack of exploring the linguisticpossibilities of signed languages. We therefore in-vite the NLP community to collaborate with the CVcommunity, for their expertise in visual processing,and signing communities and sign linguists, fortheir expertise in signed languages and the livedexperiences of signers, in researching SLP. We be-lieve that first, the development of known tasksin the standard NLP pipeline to signed languageswill help us better understand how to model them,as well as provide valuable tools for higher-levelapplications. Although these tasks have been thor-oughly researched for spoken languages, they poseinteresting new challenges in a different modality.We also emphasize the need for real-world data todevelop such methods, and a close collaborationwith signing communities to have an accurate un-derstanding of how signed language technologiescan benefit signers, all the while respecting theDeaf community’s ownership of signed languages.

5.1 Building NLP PipelinesAlthough signed and spoken languages differ inmodality, we argue that as both express the syntax,semantics, and pragmatics of natural languages,fundamental theories of NLP can and should beextended to signed languages. NLP applicationsoften rely on low-level tools such as tokenizersand parsers, so we invite more research efforts onthese core NLP tasks that often lay the foundationof other applications. We also discuss what con-siderations should be taken into account for theirdevelopment to signed languages and raise openquestions that should be addressed.

Tokenization The vast majority of NLP meth-ods require a discrete input. To extend NLP tech-nologies to signed languages, we must first andforemost be able to develop adequate tokenizationtools that maps continuous signed language videosto a discrete, accurate representation with mini-mal information loss. While existing SLP systemsand datasets often use glosses as discrete lexicalunits of signed phrases, this poses three significantproblems: (1) linear, single-dimensional glossescannot fully capture the spatial constructions ofsigned languages, which downgrades downstreamperformance (Yin and Read, 2020b); (2) glossesare language-specific and requiring new glossing

models for each language is impractical given thescarcity of resources; (3) glosses lack standardacross corpora which limits data sharing and addssignificant overhead in modeling.

We thus urge the adoption of an efficient, univer-sal, and standardized method for tokenization ofsigned languages, all the while considering: how dowe define lexical units in signed languages? (John-ston and Schembri, 1999; Johnston, 2010) To whatdegree can phonological units of signed languagesbe mapped to lexical units? Should we model thearticulators of signs separately or together? Whatare the cross-linguistic phonological differencesto consider? To what extent can ideas used in au-tomatic speech recognition be applied to signedlanguages?

Syntactic Analysis Part-of-speech (POS) tag-ging and syntactic parsing are fundamental to un-derstand the meaning of words in context. Yet, nosuch linguistic tools for automatic syntactic anal-yses exist. To develop such tools, we must firstdefine to what extent POS tagging and syntacticparsing for spoken languages also generalize tosigned languages - do we need a new set of POSand dependency tags for signed languages? Howare morphological features expressed? What arethe annotation guidelines to create datasets on syn-tax? Can we draw on linguistic theories to designfeatures and rules that perform these tasks? Arethere typologically similar spoken languages forsome signed languages we can perform transferlearning with?

Named Entity Recognition (NER) Recogniz-ing named entities and finding relationships be-tween them are highligh important in informa-tion retrieval and classification. Named entitiesin signed languages can be produced by a finger-spelled sequence, a sign, or even through mouthingof the name while the referent is introduced throughpointing. Bleicken et al. (2016) attempt NERin German Sign Language (DGS) to performanonymization, but only do so indirectly, by eitherperforming NER on the gold DGS gloss annota-tions and German translations or manually on thevideos. We instead propose NER in a fully au-tomated fashion while considering, what are thevisual markers of named entities? How are theyintroduced and referenced? How are relationshipsbetween them established?

7354

Coreference Resolution Resolving coreferenceis crucial for language understanding. In signed lan-guages, present referents, where the signer explic-itly points to the entity in question, are relativelyunambiguous. In contrast, non-present referentsand classifiers are heavily grounded in the signingspace, so good modeling of the spatial coherence insign language is required. Evidence suggests thatclassic theoretical frameworks, such as discourserepresentation theory, may extend to signed lan-guages (Steinbach and Onea, 2016). We pose thefollowing questions: to what extent can automaticcoreference resolution of spoken languages be ap-plied to signed languages? How do we keep trackof referents in space? How can we leverage spatialrelations to resolve ambiguity?

Towards Linguistically Informed and Multi-modal SLP We highly encourage the collabora-tion of multimodal and SLP research communitiesto develop powerful SLP models informed by coreNLP tools such as the ones discussed, all the whileprocessing and relating information from both lin-guistic and visual modalities. On the one hand, the-ories and methods to reason multimodal messagescan enhance the joint modeling of vision and lan-guage in signed languages. SLP is especially sub-ject to three of the core technical challenges in mul-timodal machine learning (Baltrusaitis et al., 2018):translation - how do we map visual-gestural in-formation to/from audio-oral and textual informa-tion? alignment - how do we relate signed lan-guage units to spoken language units? co-learning- can we transfer high-resource spoken languageknowledge to signed language? On the other hand,meaning in spoken languages is not only conveyedthrough speech or text but also through the visualmodality. Studying signed languages can give abetter understanding of how to model co-speechgestures, spatial discourse relations, and conceptualgrounding of language through vision.

5.2 Collect Real-World DataData is essential to develop any of the core NLPtools previously described, and current efforts inSLP are often limited by the lack of adequate data.We discuss the considerations to keep in mind whenbuilding datasets, challenges of collecting suchdata, and directions to facilitate data collection.

What is Good Signed Language Data? ForSLP models to be deployable, they must be de-veloped using data that represents the real world ac-

curately. What constitutes an ideal signed languagedataset is an open question, we suggest includingthe following requirements: (1) a broad domain; (2)sufficient data and vocabulary size; (3) real-worldconditions; (4) naturally produced signs; (5) a di-verse signer demographic; (6) native signers; andwhen applicable, (7) dense annotations.

To illustrate the importance of data quality dur-ing modeling, we first take as an example a cur-rent benchmark for SLP, the RWTH-PHOENIX-Weather 2014T dataset (Cihan Camgoz et al., 2018)of German Sign Language, that does not meet mostof the above criteria: it is restricted to the weatherdomain (1); contains only around 8K segmentswith 1K unique signs (2); filmed in studio condi-tions (3); interpreted from German utterances (4);and signed by nine Caucasian interpreters (5,6).Although this dataset successfully addressed datascarcity issues at the time and successfully renderedresults comparable and fueled competitive research,it does not accurately represent signed languages inthe real world. On the other hand, the Public DGSCorpus (Hanke et al., 2020) is an open-domain (1)dataset consisting of 50 hours of natural signing(4) by 330 native signers from various regions inGermany (5,6), annotated with glosses, HamNoSysand German translations (7), meeting all but tworequirements we suggest.

We train a gloss-to-text sign language translationtransformer (Yin and Read, 2020b) on both datasets.On RWTH-PHOENIX-Weather 2014T, we obtain22.17 BLEU on testing; on Public DGS Corpus, weobtain a mere 3.2 BLEU. Although Transformersachieve encouraging results on RWTH-PHOENIX-Weather 2014T (Saunders et al., 2020b; Camgozet al., 2020a), they fail on more realistic, open-domain data. These results reveal that firstly, forreal-world applications, we need more data to trainsuch types of models, and secondly, while availabledata is severely limited in size, less data-hungryand more linguistically-informed approaches maybe more suitable. This experiment reveals how itis crucial to use data that accurately represent thecomplexity and diversity of signed languages toprecisely assess what types of methods are suitable,and how well our models would deploy to the realworld.

Challenges of Data Collection Collecting andannotating signed data inline with the ideal requiresmore resources than speech or text data, taking upto 600 minutes per minute of an annotated signed

7355

language video (Hanke et al., 2020). Moreover, an-notation usually require a specific set of knowledgeand skills, which makes recruiting or training qual-ified annotators challenging. Additionally, thereis little existing signed language data in the wildthat are open to use, especially from native signersthat are not interpretations of speech. Therefore,data collection often requires significant efforts andcosts of on-site recording as well.

Automating Annotation To collect more datathat enables the development of deployable SLPmodels, one useful research direction is creatingtools that can simplify or automate parts of the col-lection and annotation process. One of the largestbottleneck in obtaining more adequate signed lan-guage data is the amount of time and scarcity ofexperts required to perform annotation. Therefore,tools that perform automatic parsing, detection offrame boundaries, extraction of articulatory fea-tures, suggestions for lexical annotations, and allowparts of the annotation process to be crowdsourcedto non-experts, to name a few, have a high potentialto facilitate and accelerate the availability of gooddata.

5.3 Practice Deaf CollaborationFinally, when working with signed languages, it isvital to keep in mind who this technology shouldbenefit, and what they need. Researchers in SLPmust honor that signed languages belong to theDeaf community and avoid exploiting their lan-guage as a commodity (Bird, 2020).

Solving Real Needs Many efforts in SLP havedeveloped intrusive methods (e.g. requiring signersto wear special gloves), which are often rejectedby signing communities and therefore have limitedreal-world value. Such efforts are often marketedto perform “sign language translation” when they,in fact, only identify fingerspelling or recognize avery limited set of isolated signs at best. These ap-proaches oversimplify the rich grammar of signedlanguages, promote the misconception that signsare solely expressed through the hands, and areconsidered by the Deaf community as a manifes-tation of audism, where it is the signers who mustmake the extra effort to wear additional sensorsto be understood by non-signers (Erard, 2017). Inorder to avoid such mistakes, we encourage closeDeaf involvement throughout the research processto ensure that we direct our efforts towards appli-cations that will be adopted by signers, and do not

make false assumptions about signed languages orthe needs of signing communities.

Building Collaboration Deaf collaborationsand leadership are essential for developing signedlanguage technologies to ensure they address thecommunity’s needs and will be adopted, and thatthey do not rely on misconceptions or inaccuraciesabout signed language (Harris et al., 2009; Kusterset al., 2017). Hearing researchers cannot relate tothe deaf experience or fully understand the con-text in which the tools being developed would beused, nor can they speak for the deaf. Therefore,we encourage the creation of a long-term collab-orative environment between signed language re-searchers and users, so that deaf users can identifymeaningful challenges, and provide insights on theconsiderations to take, while researchers cater tothe signers’ needs as the field evolves. We alsorecommend reaching out to signing communitiesfor reviewing papers on signed languages, to en-sure an adequate evaluation of this type of researchresults published at ACL venues. There are sev-eral ways to connect with Deaf communities forcollaboration: one can seek deaf students in theirlocal community, reach out to schools for the deaf,contact deaf linguists, join a network of researchersof sign-related technologies6, and/or participate indeaf-led projects.

6 Conclusions

We urge the inclusion of signed languages inNLP. We believe that the NLP community is well-positioned, especially with the plethora of success-ful spoken language processing methods coupledwith the recent advent of computer vision toolsfor videos, to bring the linguistic insight neededfor better signed language models. We hope tosee an increase in both the interests and efforts incollecting signed language resources and develop-ing signed language tools while building a strongcollaboration with signing communities.

Acknowledgements

We would like to thank Marc Schulder, ClaudeMauk, David Mortensen, Chaitanya Ahuja, Sid-dharth Dalmia, Shruti Palaskar and Graham Neu-big as well as the anonymous reviewers for theirhelpful feedback and insightful discussions.

6https://www.crest-network.com/

https://www.crest-network.com/

7356

ReferencesNikolas Adaloglou, Theocharis Chatzis, Ilias Papas-

tratis, Andreas Stergioulas, Georgios Th Papadopou-los, Vassia Zacharopoulou, George J Xydopou-los, Klimnis Atzakas, Dimitris Papazachariou, andPetros Daras. 2020. A comprehensive study onsign language recognition methods. arXiv preprintarXiv:2007.12530.

Waleed Ammar, Dirk Groeneveld, Chandra Bhagavat-ula, Iz Beltagy, Miles Crawford, Doug Downey, Ja-son Dunkelberger, Ahmed Elgohary, Sergey Feld-man, Vu Ha, Rodney Kinney, Sebastian Kohlmeier,Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Pe-ters, Joanna Power, Sam Skjonsberg, Lucy Wang,Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen,and Oren Etzioni. 2018. Construction of the litera-ture graph in semantic scholar. In Proceedings ofthe 2018 Conference of the North American Chap-ter of the Association for Computational Linguistics:Human Language Technologies, Volume 3 (IndustryPapers), pages 84–91, New Orleans - Louisiana. As-sociation for Computational Linguistics.

Naveen Arivazhagan, Ankur Bapna, Orhan Firat,Dmitry Lepikhin, Melvin Johnson, Maxim Krikun,Mia Xu Chen, Yuan Cao, George Foster, ColinCherry, et al. 2019. Massively multilingual neuralmachine translation in the wild: Findings and chal-lenges. arXiv preprint arXiv:1907.05019.

Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learn-ing: A survey and taxonomy. IEEE transac-tions on pattern analysis and machine intelligence,41(2):423–443.

Robbin Battison. 1978. Lexical borrowing in Americansign language. ERIC.

Ursula Bellugi and Susan Fischer. 1972. A comparisonof sign language and spoken language. Cognition,1(2-3):173–200.

Brita Bergman. 1979. Signed Swedish. NationalSwedish Board of Education [Skoloverstyr.]:.

Louise de Beuzeville. 2008. Pointing and verb modifi-cation: the expression of semantic roles in the auslancorpus. In Workshop Programme, page 13. Citeseer.

Steven Bird. 2020. Decolonising speech and lan-guage technology. In Proceedings of the 28th Inter-national Conference on Computational Linguistics,pages 3504–3519, Barcelona, Spain (Online). Inter-national Committee on Computational Linguistics.

Julian Bleicken, Thomas Hanke, Uta Salden, and SvenWagner. 2016. Using a language technology in-frastructure for German in order to anonymize Ger-man Sign Language corpus data. In Proceedingsof the Tenth International Conference on LanguageResources and Evaluation (LREC’16), pages 3303–3306, Portoroz, Slovenia. European Language Re-sources Association (ELRA).

Mark Borg and Kenneth P Camilleri. 2019. Sign lan-guage detection ”in the wild” with recurrent neu-ral networks. In ICASSP 2019-2019 IEEE Interna-tional Conference on Acoustics, Speech and SignalProcessing (ICASSP), pages 1637–1641. IEEE.

Danielle Bragg, Oscar Koller, Mary Bellard, Lar-wan Berke, Patrick Boudreault, Annelies Braffort,Naomi Caselli, Matt Huenerfauth, Hernisa Ka-corri, Tessa Verhoef, Christian Vogler, and Mered-ith Ringel Morris. 2019. Sign language recognition,generation, and translation: An interdisciplinary per-spective. In The 21st International ACM SIGAC-CESS Conference on Computers and Accessibility,ASSETS ’19, page 16–31, New York, NY, USA. As-sociation for Computing Machinery.

Diane Brentari. 2011. Sign language phonology. Thehandbook of phonological theory, pages 691–721.

Diane Brentari and Carol Padden. 2001. A languagewith multiple origins: Native and foreign vocabu-lary in american sign language. Foreign vocabularyin sign language: A cross-linguistic investigation ofword formation, pages 87–119.

Hannah Bull, Michele Gouiffes, and Annelies Braffort.2020. Automatic segmentation of sign languageinto subtitle-units. In European Conference on Com-puter Vision, pages 186–198. Springer.

Necati Cihan Camgoz, Simon Hadfield, Oscar Koller,Hermann Ney, and Richard Bowden. 2018. Neu-ral sign language translation. In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition, pages 7784–7793.

Necati Cihan Camgoz, Oscar Koller, Simon Hadfield,and Richard Bowden. 2020a. Multi-channel trans-formers for multi-articulatory sign language transla-tion. In European Conference on Computer Vision,pages 301–319.

Necati Cihan Camgoz, Oscar Koller, Simon Hadfield,and Richard Bowden. 2020b. Sign language trans-formers: Joint end-to-end sign language recognitionand translation. In Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recog-nition, pages 10023–10033.

Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, andY. A. Sheikh. 2019. Openpose: Realtime multi-person 2d pose estimation using part affinity fields.IEEE Transactions on Pattern Analysis and MachineIntelligence.

Yu Chen, Chunhua Shen, Xiu-Shen Wei, LingqiaoLiu, and Jian Yang. 2017. Adversarial posenet:A structure-aware convolutional network for humanpose estimation. In Proceedings of the IEEE In-ternational Conference on Computer Vision, pages1212–1221.

Necati Cihan Camgoz, Simon Hadfield, Oscar Koller,Hermann Ney, and Richard Bowden. 2018. Neu-ral sign language translation. In Proceedings of the

https://doi.org/10.18653/v1/N18-3011

https://doi.org/10.18653/v1/N18-3011

https://www.aclweb.org/anthology/2020.coling-main.313


https://www.aclweb.org/anthology/L16-1526



https://doi.org/10.1145/3308561.3353774

https://doi.org/10.1145/3308561.3353774

https://doi.org/10.1145/3308561.3353774

7357

IEEE Conference on Computer Vision and PatternRecognition, pages 7784–7793.

Kearsy Cormier, Sandra Smith, and Zed Sevcikova-Sehyr. 2015. Rethinking constructed action. SignLanguage & Linguistics, 18(2):167–204.

Onno Crasborn, R Bank, I Zwitserlood, E Van derKooij, E Ormel, J Ros, A Schuller, A de Meijer,M van Zuilen, YE Nauta, et al. 2016. Ngt sign-bank. Nijmegen: Radboud University, Centre forLanguage Studies.

Onno A Crasborn and IEP Zwitserlood. 2008. The cor-pus ngt: an online corpus for professionals and lay-men.

Runpeng Cui, Hu Liu, and Changshui Zhang. 2017.Recurrent convolutional neural networks for con-tinuous sign language recognition by staged opti-mization. In Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, pages7361–7369.

NCSLGR Databases. 2007. Volumes 2–7.

Philippe Dreuw, Thomas Deselaers, Daniel Keysers,and Hermann Ney. 2006. Modeling image vari-ability in appearance-based gesture recognition. InECCV workshop on statistical methods in multi-image and video processing, pages 7–18.

Paul G Dudis. 2004. Body partitioning and real-spaceblends. Cognitive linguistics, 15(2):223–238.

Sarah Ebling, Necati Cihan Camg o z, PennyBoyes Braem, Katja Tissi, Sandra Sidler-Miserez,Stephanie Stoll, Simon Hadfield, Tobias Haug,Richard Bowden, Sandrine Tornay, Marzieh Razavi,and Mathew Magimai-Doss. 2018. SMILE SwissGerman sign language dataset. In Proceedings ofthe Eleventh International Conference on LanguageResources and Evaluation (LREC 2018), Miyazaki,Japan. European Language Resources Association(ELRA).

Michael Erard. 2017. Why sign-language gloves don’thelp deaf people. The Atlantic, 9.

Iva Farag and Heike Brock. 2019. Learning motiondisfluencies for automatic sign language segmenta-tion. In ICASSP 2019-2019 IEEE International Con-ference on Acoustics, Speech and Signal Processing(ICASSP), pages 7360–7364. IEEE.

Jordan Fenlon, Kearsy Cormier, and Adam Schembri.2015. Building bsl signbank: The lemma dilemmarevisited. International Journal of Lexicography,28(2):169–206.

Jordan Fenlon, Adam Schembri, and Kearsy Cormier.2018. Modification of indicating verbs in britishsign language: A corpus-based study. Language,94(1):84–118.

Lucinda Ferreira-Brito. 1984. Similarities & differ-ences in two brazilian sign languages. Sign lan-guage studies, 42:45–56.

Jens Forster, Christoph Schmidt, Oscar Koller, MartinBellgardt, and Hermann Ney. 2014. Extensions ofthe sign language recognition and translation corpusrwth-phoenix-weather. In LREC, pages 1911–1916.

Binyam Gebrekidan Gebre, Peter Wittenburg, and TomHeskes. 2013. Automatic sign language identifica-tion. In 2013 IEEE International Conference on Im-age Processing, pages 2626–2630. IEEE.

Neil S Glickman and Wyatte C Hall. 2018. Languagedeprivation and deaf mental health. Routledge.

Rıza Alp Guler, Natalia Neverova, and Iasonas Kokki-nos. 2018. Densepose: Dense human pose estima-tion in the wild. In Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition,pages 7297–7306.

Eva Gutierrez-Sigut, Brendan Costello, Cristina Baus,and Manuel Carreiras. 2016. Lse-sign: A lexicaldatabase for spanish sign language. Behavior Re-search Methods, 48(1):123–137.

Wyatte C Hall, Leonard L Levin, and Melissa L An-derson. 2017. Language deprivation syndrome: Apossible neurodevelopmental disorder with sociocul-tural origins. Social psychiatry and psychiatric epi-demiology, 52(6):761–776.

Thomas Hanke, Marc Schulder, Reiner Konrad, andElena Jahn. 2020. Extending the Public DGS Cor-pus in size and depth. In Proceedings of theLREC2020 9th Workshop on the Representation andProcessing of Sign Languages: Sign Language Re-sources in the Service of the Language Commu-nity, Technological Challenges and Application Per-spectives, pages 75–82, Marseille, France. EuropeanLanguage Resources Association (ELRA).

Raychelle Harris, Heidi M Holmes, and Donna MMertens. 2009. Research ethics in sign languagecommunities. Sign Language Studies, 9(2):104–131.

Saad Hassan, Larwan Berke, Elahe Vahdani, Long-long Jing, Yingli Tian, and Matt Huenerfauth. 2020.An isolated-signing RGBD dataset of 100 AmericanSign Language signs produced by fluent ASL sign-ers. In Proceedings of the LREC2020 9th Workshopon the Representation and Processing of Sign Lan-guages: Sign Language Resources in the Serviceof the Language Community, Technological Chal-lenges and Application Perspectives, pages 89–94,Marseille, France. European Language ResourcesAssociation (ELRA).

Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li,and Weiping Li. 2018. Video-based sign languagerecognition without temporal segmentation. In Pro-ceedings of the AAAI Conference on Artificial Intel-ligence, volume 32.



https://www.aclweb.org/anthology/2020.signlang-1.12





7358

Tom Humphries, Poorna Kushalnagar, Gaurav Mathur,Donna Jo Napoli, Carol Padden, Christian Rath-mann, and Scott Smith. 2016. Avoiding linguis-tic neglect of deaf children. Social Service Review,90(4):589–619.

Alfarabi Imashev, Medet Mukushev, Vadim Kimmel-man, and Anara Sandygulova. 2020. A datasetfor linguistic understanding, visual evaluation, andrecognition of sign languages: The k-rsl. In Pro-ceedings of the 24th Conference on ComputationalNatural Language Learning, pages 631–640.

Amy Isard. 2020. Approaches to the anonymisationof sign language corpora. In Proceedings of theLREC2020 9th Workshop on the Representation andProcessing of Sign Languages: Sign Language Re-sources in the Service of the Language Community,Technological Challenges and Application Perspec-tives, pages 95–100.

David L Jaffe. 1994. Evolution of mechanical fin-gerspelling hands for people who are deaf-blind.Journal of rehabilitation research and development,31(3):236–244.

Robert E Johnson and Scott K Liddell. 2011. Toward aphonetic representation of signs: Sequentiality andcontrast. Sign Language Studies, 11(2):241–274.

Trevor Johnston. 2010. From archive to corpus: Tran-scription and annotation in the creation of signed lan-guage corpora. International journal of corpus lin-guistics, 15(1):106–131.

Trevor Johnston and Louise De Beuzeville. 2016. Aus-lan corpus annotation guidelines. Auslan Corpus.

Trevor Johnston and Adam Schembri. 2007. Aus-tralian Sign Language (Auslan): An introductionto sign language linguistics. Cambridge UniversityPress.

Trevor Johnston and Adam C Schembri. 1999. Ondefining lexeme in a signed language. Sign lan-guage & linguistics, 2(2):115–185.

Jim Kakumasu. 1968. Urubu sign language. Interna-tional journal of American linguistics, 34(4):275–281.

Vadim Kimmelman. 2014. Information structure in rus-sian sign language and sign language of the nether-lands.

Sang-Ki Ko, Chang Jo Kim, Hyedong Jung, andChoongsang Cho. 2019. Neural sign language trans-lation based on human keypoint estimation. AppliedSciences, 9(13):2683.

Oscar Koller. 2020. Quantitative survey of the state ofthe art in sign language recognition. arXiv preprintarXiv:2008.09918.

Reiner Konrad, Thomas Hanke, Gabriele Langer, Su-sanne Konig, Lutz Konig, Rie Nishio, and Anja Re-gen. 2018. Public dgs corpus: Annotation conven-tions. Technical report, Project Note AP03-2018-01,DGS-Korpus project, IDGS, Hamburg University.

Annelies Kusters, Maartje De Meulder, and DaiO’Brien. 2017. Innovations in deaf studies: The roleof deaf scholars. Oxford University Press.

Dongxu Li, Cristian Rodriguez, Xin Yu, and HongdongLi. 2020. Word-level deep sign language recogni-tion from video: A new large-scale dataset and meth-ods comparison. In The IEEE Winter Conference onApplications of Computer Vision, pages 1459–1469.

Scott K Liddell and Robert E Johnson. 1989. Amer-ican sign language: The phonological base. Signlanguage studies, 64(1):195–277.

Scott K Liddell and Melanie Metzger. 1998. Gesturein sign language discourse. Journal of pragmatics,30(6):657–697.

Scott K Liddell et al. 2003. Grammar, gesture, andmeaning in American Sign Language. CambridgeUniversity Press.

David McKee and Graeme Kennedy. 2000. Lexi-cal comparison of signs from american, australian,british and new zealand sign languages. The signsof language revisited: An anthology to honor UrsulaBellugi and Edward Klima, pages 49–76.

Johanna Mesch and Lars Wallin. 2012. From meaningto signs and back: Lexicography and the swedishsign language corpus. In Proceedings of the 5thWorkshop on the Representation and Processing ofSign Languages: Interactions between Corpus andLexicon [Language Resources and Evaluation Con-ference (LREC)], pages 123–126.

Johanna Mesch and Lars Wallin. 2015. Gloss anno-tations in the swedish sign language corpus. Inter-national Journal of Corpus Linguistics, 20(1):102–120.

Caio DD Monteiro, Christy Maria Mathew, RicardoGutierrez-Osuna, and Frank Shipman. 2016. Detect-ing and identifying sign languages through visualfeatures. In 2016 IEEE International Symposium onMultimedia (ISM), pages 287–290. IEEE.

Kathryn Montemurro and Diane Brentari. 2018. Em-phatic fingerspelling as code-mixing in americansign language. Proceedings of the Linguistic Soci-ety of America, 3(1):61–1.

Amit Moryossef, Ioannis Tsochantaridis, Roee Aha-roni, Sarah Ebling, and Srini Narayanan. 2020. Real-time sign language detection using human pose esti-mation. In European Conference on Computer Vi-sion, pages 237–248. Springer.

7359

Amit Moryossef, Kayo Yin, Graham Neubig, andYoav Goldberg. 2021. Data augmentation forsign language gloss translation. arXiv preprintarXiv:2105.07476.

Joseph J Murray, Wyatte C Hall, and Kristin Snoddon.2020. The importance of signed languages for deafchildren and their families. The Hearing Journal,73(3):30–32.

Sylvie CW Ong and Surendra Ranganath. 2005. Auto-matic sign language analysis: A survey and the fu-ture beyond lexical meaning. IEEE Computer Archi-tecture Letters, 27(06):873–891.

Ellen Ormel and Onno Crasborn. 2012. Prosodic cor-relates of sentences in signed languages: A litera-ture review and suggestions for new types of studies.Sign Language Studies, 12(2):279–315.

C. Padden. 1988. Interaction of Morphology and Syn-tax in American Sign Language. Outstanding DiscLinguistics Series. Garland.

Carol A Padden. 1998. The asl lexicon. Sign language& linguistics, 1(1):39–60.

Carol A Padden and Tom Humphries. 1988. Deaf inAmerica. Harvard University Press.

Becky Sue Parton. 2006. Sign language recognitionand translation: A multidisciplined approach fromthe field of artificial intelligence. Journal of deafstudies and deaf education, 11(1):94–101.

Carol J Patrie and Robert E Johnson. 2011. Finger-spelled word recognition through rapid serial visualpresentation: RSVP. DawnSignPress.

Leonid Pishchulin, Arjun Jain, Mykhaylo Andriluka,Thorsten Thorm a hlen, and Bernt Schiele. 2012.Articulated people detection and pose estimation:Reshaping the future. In 2012 IEEE Conferenceon Computer Vision and Pattern Recognition, pages3178–3185. IEEE.

Vineel Pratap, Qiantong Xu, Anuroop Sriram, GabrielSynnaeve, and Ronan Collobert. 2020. MLS: ALarge-Scale Multilingual Dataset for Speech Re-search. In Proc. Interspeech 2020, pages 2757–2761.

Siegmund Prillwitz and Heiko Zienert. 1990. Hamburgnotation system for sign language: Development ofa sign writing with computer application. In Currenttrends in European Sign Language Research. Pro-ceedings of the 3rd European Congress on Sign Lan-guage Research, pages 355–379.

Javier Ramırez, Jose C Segura, Carmen Benıtez, AngelDe La Torre, and Antonio Rubio. 2004. Efficientvoice activity detection algorithms using long-termspeech information. Speech communication, 42(3-4):271–287.

Razieh Rastgoo, Kourosh Kiani, and Sergio Escalera.2020. Sign language recognition: A deep survey.Expert Systems with Applications, page 113794.

Christian Rathmann and Gaurav Mathur. 2011. A feat-ural approach to verb agreement in signed languages.Theoretical Linguistics, 37(3-4):197–208.

Cynthia B Roy. 2011. Discourse in signed languages.Gallaudet University Press.

Wendy Sandler. 2010. Prosody and syntax in signlanguages. Transactions of the philological society,108(3):298–328.

Wendy Sandler. 2012. The phonological organizationof sign languages. Language and linguistics com-pass, 6(3):162–182.

Pinar Santemiz, Oya Aran, Murat Saraclar, and LaleAkarun. 2009. Automatic sign segmentation fromcontinuous signing via multiple sequence align-ment. In 2009 IEEE 12th International Conferenceon Computer Vision Workshops, ICCV Workshops,pages 2001–2008. IEEE.

Ben Saunders, Necati Cihan Camgoz, and RichardBowden. 2020a. Everybody sign now: Translat-ing spoken language to photo realistic sign languagevideo. arXiv preprint arXiv:2011.09846.

Ben Saunders, Necati Cihan Camgoz, and RichardBowden. 2020b. Progressive transformers for end-to-end sign language production. In European Con-ference on Computer Vision, pages 687–705.

Adam Schembri, Kearsy Cormier, and Jordan Fenlon.2018. Indicating verbs as typologically unique con-structions: Reconsidering verb ‘agreement’in signlanguages. Glossa: a journal of general linguistics,3(1).

B. Shi, A. Martinez Del Rio, J. Keane, D. Brentari,G. Shakhnarovich, and K. Livescu. 2019. Finger-spelling recognition in the wild with iterative visualattention. ICCV.

B. Shi, A. Martinez Del Rio, J. Keane, J. Michaux,G. Shakhnarovich D. Brentari, and K. Livescu. 2018.American sign language fingerspelling recognitionin the wild. SLT.

Edgar H Shroyer and Susan P Shroyer. 1984. Signsacross America: A look at regional differences inAmerican Sign Language. Gallaudet UniversityPress.

Ozge Mercanoglu Sincan and Hacer Yalim Keles. 2020.Autsl: A large scale multi-modal turkish sign lan-guage dataset and baseline methods. IEEE Access,8:181340–181355.

Jongseo Sohn, Nam Soo Kim, and Wonyong Sung.1999. A statistical model-based voice activity de-tection. IEEE signal processing letters, 6(1):1–3.

https://books.google.com/books?id=Mea7AAAAIAAJ

https://books.google.com/books?id=Mea7AAAAIAAJ

https://doi.org/10.21437/Interspeech.2020-2826



7360

Markus Steinbach and Edgar Onea. 2016. A DRT anal-ysis of discourse referents and anaphora resolutionin sign language. Journal of Semantics, 33(3):409–448.

Jr. Stokoe, William C. 1960. Sign Language Structure:An Outline of the Visual Communication Systemsof the American Deaf. The Journal of Deaf Studiesand Deaf Education, 10(1):3–37.

William C Stokoe Jr. 2005. Sign language structure:An outline of the visual communication systems ofthe american deaf. Journal of deaf studies and deafeducation, 10(1):3–37.

Stephanie Stoll, Necati Cihan Camgoz, Simon Had-field, and Richard Bowden. 2018. Sign languageproduction using neural machine translation and gen-erative adversarial networks. In Proceedings ofthe 29th British Machine Vision Conference (BMVC2018). British Machine Vision Association.

Stephanie Stoll, Necati Cihan Camgoz, Simon Had-field, and Richard Bowden. 2020. Text2sign: to-wards sign language production using neural ma-chine translation and generative adversarial net-works. International Journal of Computer Vision,pages 1–18.

Ted Supalla. 1986. The classifier system in ameri-can sign language. Noun classes and categorization,7:181–214.

Valerie Sutton. 1990. Lessons in sign writing. Sign-Writing.

Hamid Vaezi Joze and Oscar Koller. 2019. Ms-asl: Alarge-scale data set and benchmark for understand-ing american sign language. In The British MachineVision Conference (BMVC).

Sherman Wilcox. 1992. The phonetics of finger-spelling, volume 4. John Benjamins Publishing.

Sherman Wilcox and Sarah Hafer. 2004. Rethinkingclassifiers. emmorey, k.(ed.).(2003). perspectives onclassifier constructions in sign languages. mahwah,nj: Lawrence erlbaum associates. 332 pages. hard-cover.

Qinkun Xiao, Minying Qin, and Yuting Yin. 2020.Skeleton-based chinese sign language recognitionand generation for bidirectional communication be-tween deaf and hearing people. Neural Networks,125:41–55.

Kayo Yin and Jesse Read. 2020a. Attention is all yousign: sign language translation with transformers.In Sign Language Recognition, Translation and Pro-duction (SLRTP) Workshop-Extended Abstracts, vol-ume 4.

Kayo Yin and Jesse Read. 2020b. Better sign languagetranslation with STMC-transformer. In Proceed-ings of the 28th International Conference on Com-putational Linguistics, pages 5975–5989, Barcelona,

Spain (Online). International Committee on Compu-tational Linguistics.

Jan Zelinka and Jakub Kanis. 2020. Neural sign lan-guage synthesis: Words are our glosses. In TheIEEE Winter Conference on Applications of Com-puter Vision, pages 3395–3403.

https://doi.org/10.1093/deafed/eni001



https://www.microsoft.com/en-us/research/publication/ms-asl-a-large-scale-data-set-and-benchmark-for-understanding-american-sign-language/





including signed languages in natural language processing

Documents