mpi tla din a0 2011-11-16

TheLanguage ArchiveLanguage

Data

Experts Collaboration

Tools

Projects

Max Planck Institute for Psycholinguistics

TLA‘s Mission• digitizeandarchivelanguageresources

• supportaccesstolanguageresources

• developtools,servicesandinfrastructures

• setupofregionalarchivesworldwide

• organizeeducationandtrainingactivities

• givehelpandsupport

TheLanguageArchiveMax Planck Institute for PsycholinguisticsP.O.Box310,6500AHNijmegenWundtlaan1,6525XDNijmegenTheNetherlandsPhone: (+31)(0)24-3521911Fax: (+31)(0)24-3521213eMail: [email protected]/tla

State of the Archive•60+terabyte,500.000+files

•73.000+metadatasessions

•20.000+hoursaudio/videorecordings

•60.000+annotationfiles

•4.5million+annotatedsegments

•45+lexica

•speech,multimodal,acquisition,multilingual,languageandcognition,brainimaging,ethnologicalandotherdata

TLA is jointly funded by the Max-Planck-Society, the Berlin-Brandenburg Academy of Sciences and the Royal Netherlands Academy of Arts and Sciences

WithsubstantialcontributionsbytheVolkswagen-Foundation,theEuropeanCommission,theGermanMinistryforEducationandResearch,theDutchScienceFoundationandtheMaxPlanckInstituteforPsycholinguistics.

Nov

ember201

1

Language Archiving Technology LAT

TLAbuildsonalargearchiveoflanguageresources,includingprimarydata(multimediarecordings),secondarydata(annotation,lexica,comments,etc.),andmetadata.Topreventitsloss,theArchiveiscopiedtovariouslocationsincludingagrowingnumberofregionalarchives,preservingrelations,contextsandprovenanceinformation.

Totakecareoftheinterpretabilityofdatainthelongrun,adherencetostandardsandacontinuouscurationprocedureareveryimportant.AccesstothedatainthespiritoftheLive Archivesideaandregulatedbyacodeofconductandotheragreementsisguaranteedtothosewhohaveaccesspermissionstotheindividualresourceswhicharedefinedinfourlevels(fullyopentoclosed)bythedepositors.

Besidesthefieldworkdataofabout60DOBESprojects,TLAcontinuestodigitizeandarchiveanincreasingamountofotherlanguagerelateddata.Currentlytherearedataonmorethan200languagesinthearchive.

Archive

Technology

TheLATsoftwaresuite,startedin2000withthemulti-mediaannotationtoolELANandtheIMDImetadatainfrastructure,coversabout15componentsandtools.Itiscontinuouslybeingdebugged,adaptedandextended.

ItincludestoolsforResourceCreation&Organization(ELAN,LEXUS,IMDI/CMDI,ARBIL,AVRecognizers),toolsforManagement,Upload&Infrastructure(LAMUS,IMDI/CMDI,AMS,COSIX,HANDLE,REPLIX),andtoolsforbasicandcomplexresourceaccess(IMDI/CMDI,VLO,ANNEX,IMEX,LEXUS,GIS,TROVA,VICOS).

2ComputerCentersinMunich(onefromMPG)

2ComputerCentersinGöttingen(onefromMPG)

2CopiesMPINijmegen

Activities

TLAisinvolvedinanumberofinitiativesdevotedtothearchivingofdigitallanguagedata,totheimprovementoftechnologiestocreate,manageandaccesslanguagedata,andtotheconstructionofinfrastructuresthatfacilitatecross-institutionalandcross-corporaaccess.TheresultinginfrastructureswillallowresearcherstobuildvirtualcollectionsandworkflowstoimprovedataaccessinthedirectionofeHumanitiesusagescenarios.TLAalsocontributestostandardizationinISOTC37/SC4(www.tc37sc4.org)whichaimsatfacilitatinginteroperabilityinthelanguageresourcesdomain.

PastProjects:MUMIS,INTERA,ISLE,LIRICS,DAM-LR(allEC),CGN(NWO),HARVE,INTER,ROR(allMPG),REPLIX,(DEISA,CLARIN-EU).RunningProjects:DOBES(VWS),CLARIN(NL,DE),DASISH,INNET,CLARA,EUDAT(allEC),AVATecH,(MPG-FhG),RELISH(DFG/NEH).

preparation

integration

utilization

RELcat / ISOcat Ontology

managementframework

Archivefederation

Infrastructures

Data Life Cycle Support

Data Archiving and Copying

IMDI / CMDI / GIS / VLO

MetadataBrowsing&Searching

IMDI / CMDI / ARBILDataOrganization

MetadataDescription

ELAN / LEXUS

Annotation+Lexicon

ANNEX / LEXUS / IMEX TROVA

ComplexAccessviaWeb

VICOS

SemanticAccessandEnrichment

LAMUSDataUploadingandManagement

AccessManagement

Dokumentation BeDrohter Sprachen Documentation oF enDanGereD LanGuaGeS DOBES

DéĮine

Beaver

Hoocąk

Wichita

Chontal

Lacandón

Aikanã/Kwazá

Tsafiki

People of the Center

Cashinahua

Baure

Movima

Yuracaré

Uru-Chipaya

Chaco Languages

Marquesan

Tuamotuan

Minderico

Bainouk

Laal

Beezen

Bubia / Isubu

Bakola

Tima

Oyda

=| Akhoe Hai||om

Taa

Lower Sorbian

Kola-Sámi

Enets / Nenets

Svan / Udi / Tsova-Tush

Gorani

Khinalug Semoq Beri / Batek

Semang

Totoli

Waima‘a

Wooi

Teop

Saliba / Logea

Savosavo

Vurës / Vera‘a

Iwaidja

Jaminjung

Nen/Tonda

Ambrym Languages

Tofa

Even

Salar / Monguor

Chintang / Puma

Tangsa / Tai / Singpho

Kurumba Languages

Sri Lanka Malay

Katxuyana

Mawé

Trumai

Kuikuro

Awetí

Bakairí

Ache

Regional archives

DOBES

MPI

Archive

mpi tla din a0 2011-11-16

Documents