![Page 1: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/1.jpg)
Using the data in the archive
Jacquelijn Ringersma
The Language ArchiveMax Planck Institute for Psycholinguistics
DGfS-CNRS Summer School on Linguistic Typology
![Page 2: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/2.jpg)
A very rich archive
![Page 3: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/3.jpg)
Annotated Media
Video clips, sound files & images
Multimedia Lexicon
Typed Relations within the Lexicon
A very rich archive
Described Corpus
![Page 4: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/4.jpg)
![Page 5: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/5.jpg)
Only metadata are open:for resources you need access rights from the owner of the data
Two warnings
Only well described resources can be found
![Page 6: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/6.jpg)
Content
1. Online IMDI browser2. Metadata search3. Viewing resources and ANNEX viewer4. TROVA content search
5. Virtual Language ObservatoryGE overlayFacetted browsing
![Page 7: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/7.jpg)
IMDI browser
http://corpus1.mpi.nl
![Page 8: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/8.jpg)
Metadata search
Keyword search
Standard search
Advanced search
http://corpus1.mpi.nl
![Page 9: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/9.jpg)
Metadata search
http://corpus1.mpi.nl
![Page 10: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/10.jpg)
Metadata search http://corpus1.mpi.nl
Find some resources in the DoBeS archive which are:
1.Spoken discourse with at least one consultant (2634)
2.Spoken discourse with at least two consultants (1575)
3.Spoken discourse with at least two consultants in Asia (36)
4.Or Spoken discourse with at least two consultants in a Face to Face conversation (391)
![Page 11: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/11.jpg)
Metadata search
![Page 12: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/12.jpg)
Viewing resources
![Page 13: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/13.jpg)
Viewing resources
Viewing video (mpg, mpeg) and audio (wav)
![Page 14: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/14.jpg)
Viewing images (jpg/tiff)
Viewing resources
![Page 15: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/15.jpg)
Viewing text files (pdf/txt)
ANNEX: Viewing ELAN, Toolbox and Chat annotations
Viewing resources
![Page 16: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/16.jpg)
Content search
TROVA
![Page 17: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/17.jpg)
TROVA: Content search
Three options:
Simple keyword search
Single layer search (in one annotation tier)
but: Annotation/Over annotations/Within annotations
and: case (in)sensitivity
and: substring/exact match and regular expressions
Multiple layer search:
complex searches over multiple layers
![Page 18: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/18.jpg)
TROVA: Content search
Simple search
![Page 19: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/19.jpg)
TROVA: Content search
![Page 20: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/20.jpg)
TROVA: Content search
(Over and within) Annotation
Tier selection (speech vs. words)
![Page 21: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/21.jpg)
TROVA: Content search
Regular expressions
Examples:
. = any character
[abc] = a, b or c
[^abc] = any character, but not a,b, or c
b[a-zA-z]ng matches ‘bang’ but not baang
X* = x zero or more times
X+ = X one or more times
X|Y = X or Y
^ = beginning of an annotation, $ is end of an annotation
![Page 22: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/22.jpg)
TROVA: Content search
Regular expression:
[^n]g$ finds all ending ‘g’, but not ‘ng’
![Page 23: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/23.jpg)
TROVA: Content search
![Page 25: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/25.jpg)
Virtual Language World
Google Earth overlay:
• Geographic navigation: approach for novice users• Google Earth is a popular, freely available tool• KML format is widely used and easily convertible
![Page 26: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/26.jpg)
Virtual Language World
Place marks for• linguistic archives• language sites• entry point for sets of resource bundles
![Page 27: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/27.jpg)
Virtual Language World
place marks can be enriched with introductory texts, photos and direct links to the MPI archive
![Page 28: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/28.jpg)
Facetted browser
![Page 29: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/29.jpg)
Facetted browserwww.clarin.eu/vlo
Find some resources in the DoBeS archive which are:
1.Spoken discourse with at least one consultant (2634)
2.Spoken discourse with at least two consultants (1575)
3.Spoken discourse with at least two consultants in Asia (36)
4.Or Spoken discourse with at least two consultants in a Face to Face conversation (391)
![Page 30: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/30.jpg)
Facetted browser www.clarin.eu/vlo
Find some resources in the catalogue which are:
1.Personal anecdotes recorded in Sout-America2.Telephone conversation recordings in Nepal
Open the metadata files in the IMDI browser and check the full content of the metadata files
![Page 31: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/31.jpg)
Speech community portals
![Page 32: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/32.jpg)
Summary
browser
metadata search
TROVA ANNEX viewer
GE overlay
facetted browsingPortals
![Page 33: The Language Archive Max Planck Institute for](https://reader031.vdocuments.us/reader031/viewer/2022011809/61d44fa9248eff6ec40c4e38/html5/thumbnails/33.jpg)