aikuma: a mobile app for collaborative language documentation · aikuma: a mobile app for...

5

Click here to load reader

Upload: lethu

Post on 04-Aug-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Aikuma: A Mobile App for Collaborative Language Documentation · Aikuma: A Mobile App for Collaborative Language Documentation Steven Bird 1;2, Florian R. Hanke , Oliver Adams , and

Aikuma: A Mobile App for Collaborative Language Documentation

Steven Bird1,2, Florian R. Hanke1, Oliver Adams1, and Haejoong Lee21Dept of Computing and Information Systems, University of Melbourne

2Linguistic Data Consortium, University of Pennsylvania

Abstract

Proliferating smartphones and mobilesoftware offer linguists a scalable, net-worked recording device. This paper de-scribes Aikuma, a mobile app that is de-signed to put the key language documen-tation tasks of recording, respeaking, andtranslating in the hands of a speech com-munity. After motivating the approach wedescribe the system and briefly report onits use in field tests.

1 Introduction

The core of a language documentation consistsof primary recordings along with transcriptionsand translations (Himmelmann, 1998; Woodbury,2003). Many members of a linguistic communitymay contribute to a language documentation, play-ing roles that depend upon their linguistic com-petencies. For instance, the best person to pro-vide a text could be a monolingual elder, while thebest person to translate it could be a younger bilin-gual speaker. Someone else again may be the bestchoice for performing transcription work. What-ever the workflow and degree of collaboration,there is always the need to manage files and cre-ate secondary materials, a data management prob-

Figure 1: Phrase-aligned bilingual audio

lem. The problem is amplified by the usual prob-lems that attend linguistic fieldwork: limited hu-man resources, limited communication, and lim-ited bandwidth.

The problem is not to collect large quantities ofprimary audio in the field using mobile devices (deVries et al., 2014). Rather, the problem is to en-sure the long-term interpretability of the collectedrecordings. At the most fundamental level, wewant to know what words were spoken, and whatthey meant. Recordings made in the wild suf-fer from the expected range of problems: far-fieldrecording, significant ambient noise, audience par-ticipation, and so forth. We address these prob-lems via the “respeaking” task (Woodbury, 2003).Recordings made in an endangered language maynot be interpretable once the language falls out ofuse. We address this problem via the “oral trans-lation” task. The result is relatively clean sourceaudio recordings with phrase-aligned translations(see Figure 1). NLP methods are applicable tosuch data (Dredze et al., 2010), and we can hopethat ultimately, researchers working on archivedbilingual audio sources will be able to automati-cally extract word-glossed interlinear text.

We describe Aikuma, an open source Androidapp that supports recording along with respeaking

Figure 2: Adding a time-aligned translation

Page 2: Aikuma: A Mobile App for Collaborative Language Documentation · Aikuma: A Mobile App for Collaborative Language Documentation Steven Bird 1;2, Florian R. Hanke , Oliver Adams , and

and oral translation, while capturing basic meta-data. Aikuma supports local networking so thata set of mobile phones can be synchronized, andanyone can listen to and annotate the recordingsmade by others. Key functionality is providedvia a text-less interface (Figure 2). Aikuma in-troduces social media and networked collabora-tion to village-based fieldwork, all on low-cost de-vices, and this is a boon for scaling up the quan-tity of documentary material that can be collectedand processed. Field trials in Papua New Guinea,Brazil, and Nepal have demonstrated the effective-ness of the approach (Bird et al., 2014).

2 Thought Experiment: The FuturePhilologist

A typical language documentation project isresource-bound. So much documentation could becollected, yet the required human resources to pro-cess it all adequately are often not available. Forinstance, some have argued that it is not effectiveto collect large quantities of primary recordingsbecause there is not the time to transcribe it.1

Estimates differ about the pace of language loss.Yet it is uncontroversial that – for hundreds of lan-guages – only the oldest living speakers are well-versed in traditional folklore. While a given lan-guage may survive for several more decades, theopportunity to document significant genres maypass much sooner. Ideally, a large quantity of thesenearly-extinct genres would be recorded and givensufficient further treatment in the form of respeak-ings and oral translations, in order to have archivalvalue. Accordingly, we would like to determinewhat documentary materials would be of greatestpractical value to the linguist working in the fu-ture, possibly ten to a hundred or more years infuture. Given the interest of classical philology inancient languages, we think of this researcher asthe “future philologist.”

Our starting point is texts, as the least processeditem of the so-called “Boasian trilogy.” A substan-tial text corpus can serve as the basis for the prepa-ration of grammars and dictionaries even once alanguage is extinct, as we know from the cases ofthe extinct languages of the Ancient Near East.

1E.g. Paul Newman’s 2013 seminar The Law of Un-intended Consequences: How the Endangered LanguagesMovement Undermines Field Linguistics as a ScientificEnterprise, https://www.youtube.com/watch?v=xziE08ozQok

Our primary resource is the native speaker com-munity, both those living in the ancestral home-land and the members of the diaspora. Howcan we engage these communities in the tasksof recording, respeaking, and oral interpretation,in order to generate the substantial quantity ofarchival documentation?

Respeaking involves listening to an originalrecording and repeating what was heard carefullyand slowly, in a quiet recording environment Itgives archival value to recordings that were made“in the wild” on low-quality devices, with back-ground noise, and by people having no training inlinguistics. It provides much clearer audio content,facilitating transcription. Bettinson (2013) hasshown that human transcribers, without knowl-edge of the language under study, can generallyproduce phonetic transcriptions from such record-ings that are close enough to enable someone whoknows the language to understand what was said,and which can be used as the basis for phoneticanalysis. This means we can postpone the tran-scription task – by years or even decades – un-til such time as the required linguistic expertise isavailable to work with archived recordings.

By interpretation, we mean listening to arecording and producing a spoken translation ofwhat was heard. Translation into another languageobviates the need for the usual resource-intensiveapproaches to linguistic analysis that require syn-tactic treebanks along with semantic annotations,at the cost of a future decipherment effort (Xia andLewis, 2007; Abney and Bird, 2010).

3 Design Principles

Several considerations informed the design ofAikuma. First, to facilitate use by monolingualspeakers, the primary recording functions need tobe text free.

Second, to facilitate collaboration and guardagainst loss of phones, it needs to be possibleto continuously synchronise files between phones.Once any information has been captured on aphone, it is synchronized to the other phones onthe local network. All content from any phone isavailable from any phone, and thus only a singlephone needs to make it back from village-basedwork. After a recording is made, it needs to bepossible to listen to it on the other phones on thelocal network. This makes it easy for people toannotate each other’s recordings. This also en-

Page 3: Aikuma: A Mobile App for Collaborative Language Documentation · Aikuma: A Mobile App for Collaborative Language Documentation Steven Bird 1;2, Florian R. Hanke , Oliver Adams , and

ables users to experience the dissemination of theirrecordings, and to understand that a private activ-ity of recording a narrative is tantamount to publicspeaking. This is useful for establishing informedconsent in communities who have no previous ex-perience of the Internet or digital archiving.

Third, to facilitate trouble-shooting and futuredigital archaeology, the file format of phonesneeds to be transparent. We have devised aneasily-interpretable directory hierarchy for record-ings and users, which permits direct manipulationof recordings. For instance, all the metadata andrecordings that involve a particular speaker couldbe extracted from the hierarchy with a single file-name pattern.

4 Aikuma

Thanks to proliferating smartphones, it is now rel-atively easy and cheap for untrained people to col-lect and share many sorts of recordings, for theirown benefit and also for the benefit of languagepreservation efforts. These include oral histories,oral literature, public speaking, and discussion ofpopular culture. With inexpensive equipment andminimal training, a few dozen motivated peoplecan create a hundred hours of recorded speech (ap-prox 1M words) in a few weeks. However, addingtranscription and translation by a trained linguistintroduces a bottleneck: most languages will begone before linguists will get to them.

Aikuma puts this work in the hands of language

speakers. It collects recordings, respeakings, andinterpretations, and organizes them for later syn-chronization with the cloud and archival storage.People with limited formal education and no priorexperience using smartphones can readily use theapp to record their stories, or listen to other peo-ple’s stories to respeak or interpret them. Literateusers can supply written transcriptions and trans-lations. Items can be rated by the linguist andlanguage workers and highly rated items are dis-played more prominently, and this may be used toinfluence the documentary workflow. Recordingsare stored alongside a wealth of metadata, includ-ing language, GPS coordinates, speaker, and off-sets on time-aligned translations and comments.

4.1 Listing and saving recordings

When the app is first started, it shows a list ofavailable recordings, indicating whether they arerespeakings or translations (Figure 3(a)). Theserecordings could have been made on this phone, orsynced to this phone from another, or downloadedfrom an archive. The recording functionality isaccessed by pressing the red circle, and when theuser is finished, s/he is prompted to add metadatato identify the person or people who were recorded(Figure 3(b)) and the language(s) of the recording(Figure 3(c)).

(a) Main list (b) Adding speaker metadata (c) Adding language metadata

Figure 3: Screens for listing and saving recordings

Page 4: Aikuma: A Mobile App for Collaborative Language Documentation · Aikuma: A Mobile App for Collaborative Language Documentation Steven Bird 1;2, Florian R. Hanke , Oliver Adams , and

4.2 Playback and commentary

When a recording is selected, the user sees a dis-play for the individual recording, with its name,date, duration, and images of the participants,cf. Figure 4.

Figure 4: Recording playback screen

The availability of commentaries is indicated byuser images beneath the timeline. Once an orig-inal recording has commentaries, their locationsare displayed within the playback slider. Playbackinterleaves the original recording with the spokencommentary, cf Figure 5.

Figure 5: Commentary playback screen

4.3 Gesture vs voice activation

Aikuma provides two ways to control any record-ing activity, using gesture or voice activation. Inthe gesture-activated mode, playback is started,paused, or stopped using on-screen buttons. Forcommentary, the user presses and holds the playbutton to listen to the source, and presses and holdsthe record button to supply a commentary, cf Fig-ure 2. Activity is suspended when neither buttonis being pressed.

In the voice-activated mode, the user puts thephone to his or her ear and playback begins au-tomatically. Playback is paused when the userlifts the phone away from the ear. When the userspeaks, playback stops and the speech is recordedand aligned with the source recording.

4.4 File storageThe app supports importing of external audio files,so that existing recordings can be put through therespeaking and oral translation processes. Stor-age uses a hierarchical file structure and plain textmetadata formats which can be easily accessed di-rectly using command-line tools. Files are sharedusing FTP. Transcripts are stored using the plaintext NIST HUB-4 transcription format and can beexported in Elan format.

4.5 TranscriptionAikuma incorporates a webserver and clients canconnect using the phone’s WiFi, Bluetooth, orUSB interfaces. The app provides a browser-basedtranscription tool that displays the waveform fora recording along with the spoken annotations.Users listen to the source recording along with anyavailable respeakings and oral translations, andthen segment the audio and enter his or her ownwritten transcription and translation. These aresaved to the phone’s storage and displayed on thephone during audio playback.

5 Deployment

We have tested Aikuma in Papua New Guinea,Brazil, and Nepal (Bird et al., 2014). We taughtmembers of remote indigenous communities torecord narratives and orally interpret them into alanguage of wider communication. We collectedapproximately 10 hours of audio, equivalent to100k words. We found that the networking capa-bility facilitated the contribution of multiple mem-bers of the community who have a variety of lin-guistic aptitudes. We demonstrated that the plat-form is an effective way to engage remote indige-nous speech communities in the task of buildingphrase-aligned bilingual speech corpora. To sup-port large scale deployment, we are adding sup-port for workflow management, plus interfaces tothe Internet Archive and to SoundCloud for longterm preservation and social interaction.

Acknowledgments

We gratefully acknowledge support from the Aus-tralian Research Council, the National ScienceFoundation, and the Swiss National Science Foun-dation. We are also grateful to Isaac McAlister,Katie Gelbart, and Lauren Gawne for field-testingwork. Aikuma development is hosted on GitHub.

Page 5: Aikuma: A Mobile App for Collaborative Language Documentation · Aikuma: A Mobile App for Collaborative Language Documentation Steven Bird 1;2, Florian R. Hanke , Oliver Adams , and

ReferencesSteven Abney and Steven Bird. 2010. The Human

Language Project: building a universal corpus of theworld’s languages. In Proceedings of the 48th Meet-ing of the Association for Computational Linguis-tics, pages 88–97. Association for ComputationalLinguistics.

Mat Bettinson. 2013. The effect of respeaking on tran-scription accuracy. Honours Thesis, Dept of Lin-guistics, University of Melbourne.

Steven Bird, Isaac McAlister, Katie Gelbart, and Lau-ren Gawne. 2014. Collecting bilingual audio in re-mote indigenous villages. under review.

Nic de Vries, Marelie Davel, Jaco Badenhorst, WillemBasson, Febe de Wet, Etienne Barnard, and Altade Waal. 2014. A smartphone-based ASR data col-lection tool for under-resourced languages. SpeechCommunication, 56:119–131.

Mark Dredze, Aren Jansen, Glen Coppersmith, andKen Church. 2010. NLP on spoken documentswithout ASR. In Proceedings of the 2010 Confer-ence on Empirical Methods in Natural LanguageProcessing, pages 460–470. Association for Com-putational Linguistics.

Florian R. Hanke and Steven Bird. 2013. Large-scale text collection for unwritten languages. In Pro-ceedings of the 6th International Joint Conferenceon Natural Language Processing, pages 1134–1138.Asian Federation of Natural Language Processing.

Nikolaus P. Himmelmann. 1998. Documentary anddescriptive linguistics. Linguistics, 36:161–195.

Anthony C. Woodbury. 2003. Defining documentarylinguistics. In Peter Austin, editor, Language Docu-mentation and Description, volume 1, pages 35–51.London: SOAS.

Fei Xia and William D. Lewis. 2007. Multilingualstructural projection across interlinearized text. InProceedings of the North American Chapter of theAssociation for Computational Linguistics, pages452–459. Association for Computational Linguis-tics.