the language archive max planck institute for

33
Using the data in the archive Jacquelijn Ringersma The Language Archive Max Planck Institute for Psycholinguistics DGfS-CNRS Summer School on Linguistic Typology

Upload: others

Post on 04-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Language Archive Max Planck Institute for

Using the data in the archive

Jacquelijn Ringersma

The Language ArchiveMax Planck Institute for Psycholinguistics

DGfS-CNRS Summer School on Linguistic Typology

Page 2: The Language Archive Max Planck Institute for

A very rich archive

Page 3: The Language Archive Max Planck Institute for

Annotated Media

Video clips, sound files & images

Multimedia Lexicon

Typed Relations within the Lexicon

A very rich archive

Described Corpus

Presenter
Presentation Notes
Context in which the MPI archive is created
Page 4: The Language Archive Max Planck Institute for
Page 5: The Language Archive Max Planck Institute for

Only metadata are open:for resources you need access rights from the owner of the data

Two warnings

Only well described resources can be found

Page 6: The Language Archive Max Planck Institute for

Content

1. Online IMDI browser2. Metadata search3. Viewing resources and ANNEX viewer4. TROVA content search

5. Virtual Language ObservatoryGE overlayFacetted browsing

Page 7: The Language Archive Max Planck Institute for

IMDI browser

http://corpus1.mpi.nl

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 8: The Language Archive Max Planck Institute for

Metadata search

Keyword search

Standard search

Advanced search

http://corpus1.mpi.nl

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 9: The Language Archive Max Planck Institute for

Metadata search

http://corpus1.mpi.nl

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 10: The Language Archive Max Planck Institute for

Metadata search http://corpus1.mpi.nl

Find some resources in the DoBeS archive which are:

1.Spoken discourse with at least one consultant (2634)

2.Spoken discourse with at least two consultants (1575)

3.Spoken discourse with at least two consultants in Asia (36)

4.Or Spoken discourse with at least two consultants in a Face to Face conversation (391)

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 11: The Language Archive Max Planck Institute for

Metadata search

Page 12: The Language Archive Max Planck Institute for

Viewing resources

Page 13: The Language Archive Max Planck Institute for

Viewing resources

Viewing video (mpg, mpeg) and audio (wav)

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 14: The Language Archive Max Planck Institute for

Viewing images (jpg/tiff)

Viewing resources

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 15: The Language Archive Max Planck Institute for

Viewing text files (pdf/txt)

ANNEX: Viewing ELAN, Toolbox and Chat annotations

Viewing resources

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 16: The Language Archive Max Planck Institute for

Content search

TROVA

Page 17: The Language Archive Max Planck Institute for

TROVA: Content search

Three options:

Simple keyword search

Single layer search (in one annotation tier)

but: Annotation/Over annotations/Within annotations

and: case (in)sensitivity

and: substring/exact match and regular expressions

Multiple layer search:

complex searches over multiple layers

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 18: The Language Archive Max Planck Institute for

TROVA: Content search

Simple search

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 19: The Language Archive Max Planck Institute for

TROVA: Content search

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 20: The Language Archive Max Planck Institute for

TROVA: Content search

(Over and within) Annotation

Tier selection (speech vs. words)

Page 21: The Language Archive Max Planck Institute for

TROVA: Content search

Regular expressions

Examples:

. = any character

[abc] = a, b or c

[^abc] = any character, but not a,b, or c

b[a-zA-z]ng matches ‘bang’ but not baang

X* = x zero or more times

X+ = X one or more times

X|Y = X or Y

^ = beginning of an annotation, $ is end of an annotation

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 22: The Language Archive Max Planck Institute for

TROVA: Content search

Regular expression:

[^n]g$ finds all ending ‘g’, but not ‘ng’

Page 23: The Language Archive Max Planck Institute for

TROVA: Content search

Presenter
Presentation Notes
Tree browsing – show different corpus structures (Marquesan, Aweti) Context menu options
Page 24: The Language Archive Max Planck Institute for

Virtual Language Observatory

www.clarin.eu/vlo

Page 25: The Language Archive Max Planck Institute for

Virtual Language World

Google Earth overlay:

• Geographic navigation: approach for novice users• Google Earth is a popular, freely available tool• KML format is widely used and easily convertible

Page 26: The Language Archive Max Planck Institute for

Virtual Language World

Place marks for• linguistic archives• language sites• entry point for sets of resource bundles

Page 27: The Language Archive Max Planck Institute for

Virtual Language World

place marks can be enriched with introductory texts, photos and direct links to the MPI archive

Page 28: The Language Archive Max Planck Institute for

Facetted browser

Page 29: The Language Archive Max Planck Institute for

Facetted browserwww.clarin.eu/vlo

Find some resources in the DoBeS archive which are:

1.Spoken discourse with at least one consultant (2634)

2.Spoken discourse with at least two consultants (1575)

3.Spoken discourse with at least two consultants in Asia (36)

4.Or Spoken discourse with at least two consultants in a Face to Face conversation (391)

Page 30: The Language Archive Max Planck Institute for

Facetted browser www.clarin.eu/vlo

Find some resources in the catalogue which are:

1.Personal anecdotes recorded in Sout-America2.Telephone conversation recordings in Nepal

Open the metadata files in the IMDI browser and check the full content of the metadata files

Page 31: The Language Archive Max Planck Institute for

Speech community portals

Page 32: The Language Archive Max Planck Institute for

Summary

browser

metadata search

TROVA ANNEX viewer

GE overlay

facetted browsingPortals

Page 33: The Language Archive Max Planck Institute for