management of metadata in linguistic fieldwork: experience from the acla project
DESCRIPTION
Paper at LREC2004 (May 2004, Lisbon)TRANSCRIPT
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project
Baden Hughes1, David Penton1, Steven Bird1, Catherine Bow1, Gillian Wigglesworth1, Patrick McConvell2
and Jane Simpson3
1University of Melbourne, 2AIATSIS, 3University of Sydney
2
Overview Introduction Requirements Data Model Implementation
Data Entry Reports, Queries and Searches Exports Synchronisation Administration
Conclusion
3
Introduction A metadata creation and management tool
for a multiple fieldworker, longitudinal, child language acquisition research project
Addressing the need for principled metadata creation as well as best practice data creation
Challenging deployment scenario which is typical of numerous field-oriented linguistic research and language data collection projects
4
Requirements Data Management
Metadata for complex multimodal data Relational data for participants Delineation between participant roles Not just collection, but reports and queries
Research Methodology Integration with tool of choice for analysis 2 stage enquiry process - metadata then data Extensible controlled vocabularies User defined fields (particularly lists)
Technology Full support for data entry and enquiry in both online and
offline modes Metadata collection with maximum utility to project without
precluding other renderings eg as OLAC or IMDI catalogue Easy to install and use on multiple platforms
5
Data Model Tools for modelling
DBDesigner (open source, XML based, multi-platform)
Challenges for modelling Multiple interlinked media, sessions, and transcripts Differentiating between participants and focus children
in multiple contexts Incomplete personal data eg no DOB Non-linear progression through educational system Multiple types of anthropological relations Non-standardised linguistic classification and
nomenclature
6
Implementation Architecture
(fully independent) networked client-server single line of code difference between client and
server installation Underlying requirement to provide full
functionality in both online or offline environments Technology Platform
PHP, PEAR scripting language MySQL database engine Apache HTTP server fundamentally open source, cross-platform
7
8
Data Entry Forms based data entry
Participant Form Session Form
Feature of both these forms is the “build your own list” form interface which allows end user to construct a list of parameters and then apply instances of these parameters within the parent form educational progress session-media-transcript
9
Reports, Queries and Searches Simple Reports
for frequently used 2 dimensional queries eg participants by fieldworker eg participants by gender
Advanced Reports design your own query interface
Full Text Query Boolean support full database index query
10
11
Exports Generate headers for CLAN
eg @participants Generate Physical Media Labels
Eg FM025.A.DV, FM025.A.MD Generate File Names for
Transcriptions eg DEV00012004049.trn
XML-based database dump
12
Synchronisation Client -> Server
SQL query identifies all changed data since last sync
Export and serialize as XML Compress, checksum Transfer over HTTP Checksum, uncompress Serialise XML to SQL Import SQL into database
Server -> Client is this process in reverse
13
14
Administration User facilitated editing of
System data Synchronisation – server settings
Extensible controlled vocabularies Languages – linked to Ethnologue and AIATSIS
codes Locations – geographical metadata Activities/tasks – both locally and globally defined
User administration Access (personal metadata) Roles (fieldworker, administrator …)
Project administration Fieldworker activity
15
Conclusion
Feature of note is complete online and offline operation
Research methodology is indicative of many field linguistics projects
Available for other interested parties to build on and extend
http://www.cs.mu.oz.au/research/lt/projects/acla-db
16
Acknowledgements The research reported here is
supported by the Australian Research Council Discovery Project Grant DP0343189.