multimodal corpora and speech technology
DESCRIPTION
Multimodal corpora and speech technology. Kristiina Jokinen University of Art and Design Helsinki [email protected]. Metaphors for Human-computer interaction. Computer as a tool Passive and transparent Supports the human goals, human control Computer as an agent - PowerPoint PPT PresentationTRANSCRIPT
Multimodal corpora and Multimodal corpora and speech technologyspeech technology
Kristiina JokinenKristiina Jokinen
University of Art and Design HelsinkiUniversity of Art and Design Helsinki
[email protected]@uiah.fi
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Metaphors for Human-computer Metaphors for Human-computer interactioninteraction
Computer as a toolComputer as a tool– Passive and transparentPassive and transparent– Supports the human goals, human controlSupports the human goals, human control
Computer as an agentComputer as an agent– Intelligent software mediating interaction between the Intelligent software mediating interaction between the
human user and an applicationhuman user and an application– Models of beliefs, desires, intentions (BDI)Models of beliefs, desires, intentions (BDI)– Complex interaction Complex interaction
Cooperation, negotiationCooperation, negotiation Multimodal communicationMultimodal communication
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Research at UIAHResearch at UIAH
Interact:Interact: Cooperation with Finnish universities, IT companies, Cooperation with Finnish universities, IT companies,
Association of the Deaf, Arla InstituteAssociation of the Deaf, Arla Institute Finnish dialogue systemFinnish dialogue system Rich interaction situationRich interaction situation Adaptive machine learning techniquesAdaptive machine learning techniques Agent-based architectureAgent-based architecture www.mlab.uiah.fi/interact/www.mlab.uiah.fi/interact/
DUMAS:DUMAS: EU IST-project (SICS, UIAH, UTA, UMIST, Etex, Conexor, EU IST-project (SICS, UIAH, UTA, UMIST, Etex, Conexor,
Timehouse)Timehouse) User modelling for AthosMail (Interactive email application)User modelling for AthosMail (Interactive email application) Reinforcement learning and dialogue strategiesReinforcement learning and dialogue strategies www.sics.se/~dumas/ www.sics.se/~dumas/
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Multimodal Museum InterfacesMultimodal Museum Interfaces
Marjo MMarjo Mäenpää, Antti Raikeäenpää, Antti Raike Study projectsStudy projects NNew ways of relating art that is both visually ew ways of relating art that is both visually
interesting and accessible in terms of contentsinteresting and accessible in terms of contents::– virtual human (avatar) that interactively guides the user virtual human (avatar) that interactively guides the user
through the exhibition using both spoken and sign through the exhibition using both spoken and sign languagelanguage
Design for all: Design for all: accessibility to virtual visitors on accessibility to virtual visitors on museum web sitesmuseum web sites
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
MUMIN NetworkMUMIN Network
NorFA network on MUltiModal INterfacesNorFA network on MUltiModal INterfaces Support for contacts, cooperation, Support for contacts, cooperation,
education, and research on multimodal education, and research on multimodal interactive systemsinteractive systems
MUMIN PhD-course in Tampere 18-22 MUMIN PhD-course in Tampere 18-22 November (lectures and hands-on exercises November (lectures and hands-on exercises on eye-tracking, speech interfaces, on eye-tracking, speech interfaces, electromagnetogram, virtual world)electromagnetogram, virtual world)
More information and application forms: More information and application forms: http://www.cst.dk/muminhttp://www.cst.dk/mumin
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Content of the lectureContent of the lecture
Definitions and terminologyDefinitions and terminology Why multimodalityWhy multimodality Projects and toolsProjects and tools Multimodal annotationsMultimodal annotations Conclusions and referencesConclusions and references
Definitions and TerminologyDefinitions and Terminology
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
What is multi-modality?What is multi-modality?Mark Maybury: Dagstuhl seminar 2001
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Human-computer interactionHuman-computer interactionGibbon et al. (2000) Handbook of Multimodal and Spoken Dialogue Systems
Control: manipulation and coordination of information
Perception: transforming sensory information to higher level representations
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
TerminologyTerminology
Maybury and WahlsterMaybury and Wahlster (1998) (1998) MediumMedium = material object used for = material object used for
presenting or saving information, physical presenting or saving information, physical carriers (sounds, movements, NL)carriers (sounds, movements, NL)
CodeCode = system of symbols used for = system of symbols used for communicationcommunication
ModalityModality = = senses employed to process senses employed to process incoming information (vision, audition, incoming information (vision, audition, olfaction, touch, olfaction, touch, taste) => perceptiontaste) => perception– vs.vs.
communication system, consisting of a code communication system, consisting of a code expressed through a certain medium => HCIexpressed through a certain medium => HCI
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
ISLE/NIMM definitionsISLE/NIMM definitions
Medium = physical channel for information Medium = physical channel for information encoding: visual, audio, gesturesencoding: visual, audio, gestures
Modality = particular way of encoding Modality = particular way of encoding information in some mediuminformation in some medium
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
EAGLES definitionsEAGLES definitions Multimodal systemsMultimodal systems represent and manipulate information represent and manipulate information
from different human communication channels at multiple from different human communication channels at multiple levels of abstractionlevels of abstraction
Multimedia systemsMultimedia systems offer more than one device for user offer more than one device for user input to the system and for system feedback to the user, input to the system and for system feedback to the user, e.g. microphone, speaker, keyboard, mouse, touch e.g. microphone, speaker, keyboard, mouse, touch screen, camerascreen, camera– do not generate abstract concepts automatically do not generate abstract concepts automatically – do not transform the informationdo not transform the information
Multimodal (audio-visual) speech systemsMultimodal (audio-visual) speech systems utilise the same utilise the same multiple channels as human communication by integrating multiple channels as human communication by integrating non-verbal cues (facial expression, eye/gaze, and lip non-verbal cues (facial expression, eye/gaze, and lip movements) with ASR and SSmovements) with ASR and SS
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Why multimodality?Why multimodality?
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Why multimodal researchWhy multimodal research
Next generation interface design will be more Next generation interface design will be more conversational in styleconversational in style– Flexible use of input modes depending on the setting: Flexible use of input modes depending on the setting:
speech, gesture, pen, etc.speech, gesture, pen, etc.– Broader range of users: Broader range of users: ordinary citizens, children, ordinary citizens, children,
elderly, users with special needs elderly, users with special needs Human communication researchHuman communication research
– CA, psychologistsCA, psychologists– esp. nonverbal behaviour and speechesp. nonverbal behaviour and speech
Animated interface agentsAnimated interface agents
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Advantages of MM interfacesAdvantages of MM interfaces
Redundant and/or complementary modalities can increase Redundant and/or complementary modalities can increase interpretation accuracyinterpretation accuracy– E.g. combine ASR and lipreading in noisy environmentsE.g. combine ASR and lipreading in noisy environments
Different modalities, different benefitsDifferent modalities, different benefits– Object references easier by pointing than by speakingObject references easier by pointing than by speaking– Commands easier to speak than to choose from a menu using a Commands easier to speak than to choose from a menu using a
pointing devicepointing device– Multimedia output more expressive than single-medium outputMultimedia output more expressive than single-medium output
New applicationsNew applications– Some tasks cumbersome or impossible in a single modalitySome tasks cumbersome or impossible in a single modality– E.g. interactive TVE.g. interactive TV
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Advantages (cont.)Advantages (cont.)
Freedom of choiceFreedom of choice– users differ in their modality preferencesusers differ in their modality preferences– user have different needs (Design for All)user have different needs (Design for All)
NaturalnessNaturalness– Transfer the habits and strategies learned in human-Transfer the habits and strategies learned in human-
human communication to human-computer interactionhuman communication to human-computer interaction Adaptation to different environmental settings or Adaptation to different environmental settings or
evolving environments evolving environments – switch from one modality to another depending on switch from one modality to another depending on
external conditions (noise, light...)external conditions (noise, light...)
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
””Disadvantages”Disadvantages”
Coordination and combination of modalities Coordination and combination of modalities – cognitive overload of the user by stimulation cognitive overload of the user by stimulation
with too many mediawith too many media
Collection of data more expensiveCollection of data more expensive– more complex technical setupmore complex technical setup– increased amount of data to be collectedincreased amount of data to be collected– interdisciplinary know-howinterdisciplinary know-how
““Natural” remains a rather vague termNatural” remains a rather vague term
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Projects and toolsProjects and tools
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
EAGLES/ISLE initiativesEAGLES/ISLE initiatives
EAGLES = Expert Advisory Group on Language EAGLES = Expert Advisory Group on Language Engineering StandardsEngineering Standards– Gibbon et al. (1997) Handbook on Standards and Resources for Gibbon et al. (1997) Handbook on Standards and Resources for
Spoken Language Systems.Spoken Language Systems.– Gibbon et al (2000) Handbook of multimodal and spoken dialogue Gibbon et al (2000) Handbook of multimodal and spoken dialogue
systems. Resources, Terminology, and Product Evaluation.systems. Resources, Terminology, and Product Evaluation.
ISLE/NIMM = International Standards for Language ISLE/NIMM = International Standards for Language Engineering / Natural Interaction and Multi-ModalityEngineering / Natural Interaction and Multi-Modality– discuss annotation schemes specifically for the fields of natural discuss annotation schemes specifically for the fields of natural
interaction and multi-modal research and developmentinteraction and multi-modal research and development– develop guidelines for such schemesdevelop guidelines for such schemes
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
NITENITE Dybkjaer et al. 2001Dybkjaer et al. 2001 workbench for multilevel and multimodal workbench for multilevel and multimodal
annotationsannotations general purpose tools: stylesheets determine general purpose tools: stylesheets determine
look and functionality of the user’s toollook and functionality of the user’s tool continue on the basis of the project MATEcontinue on the basis of the project MATE http://nite.nis.sdu.dk/ http://nite.nis.sdu.dk/
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
MPI ProjectsMPI Projects
Max Planck Institute for Psycholinguistics (MPI) in Nijmegen Max Planck Institute for Psycholinguistics (MPI) in Nijmegen develop tools for the analysis of multimedia (esp. audiovisual) develop tools for the analysis of multimedia (esp. audiovisual)
corporacorpora support the scientific exploitation by linguists, anthropologists, support the scientific exploitation by linguists, anthropologists,
psychologists and other researcherspsychologists and other researchers CAVA CAVA (Computer Assisted Video Analysis) (Computer Assisted Video Analysis) EUDICO (European Distributed Corpora) EUDICO (European Distributed Corpora)
– platform-independentplatform-independent– support various storage formatssupport various storage formats– support distributed operation via the internetsupport distributed operation via the internet
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
ATLAS/Annotation GraphsATLAS/Annotation Graphs
Framework to represent complex Framework to represent complex annotations on signals of arbitrary annotations on signals of arbitrary dimensionalitydimensionality
Abstraction over the diversity of linguistic Abstraction over the diversity of linguistic annotations expanding on Annotation annotations expanding on Annotation GraphsGraphs
http://www.nist.gov/speech/atlas/ http://www.nist.gov/speech/atlas/
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
TalkBankTalkBank
Five year interdisciplinary research project funded by NSFFive year interdisciplinary research project funded by NSF Carnegie Mellon University and the University of Carnegie Mellon University and the University of
PennsylvaniaPennsylvania Developing a number of tools and standardsDeveloping a number of tools and standards Study human and animal communicationStudy human and animal communication
– Animal CommunicationAnimal Communication– Classroom Discourse Classroom Discourse – Linguistic ExplorationLinguistic Exploration– Gesture and SignGesture and Sign– Text and Discourse Text and Discourse
CHILDES database is viewed as a subset of TalkBankCHILDES database is viewed as a subset of TalkBank http://www.talkbank.org/http://www.talkbank.org/
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Annotation ToolsAnnotation Tools
Anvil (Michael Kipp): speech and gestureAnvil (Michael Kipp): speech and gesture AGTK (Bird and Liberman): speechAGTK (Bird and Liberman): speech MMAX (Mueller and Strube): speech, MMAX (Mueller and Strube): speech,
gesturegesture Multitool (GSMLC Multitool (GSMLC Platform for Multimodal Platform for Multimodal
Spoken Language Corpora): video, videoSpoken Language Corpora): video, video– http://www.ling.gu.se/gsmlc/http://www.ling.gu.se/gsmlc/
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Some statistics of the toolsSome statistics of the tools
Dybkjaer et al. (2002): ISLE/NIMM Survey of Dybkjaer et al. (2002): ISLE/NIMM Survey of Existing tools, standards and user needsExisting tools, standards and user needs
Speech is the key modality 9/10Speech is the key modality 9/10 Gesture 7/10Gesture 7/10 Facial expression 3/10Facial expression 3/10
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Annotation GraphsAnnotation Graphs
Bird et al. 2000Bird et al. 2000 Formal framework for representing linguistic Formal framework for representing linguistic
annotationsannotations Abstract away from file formats, coding schemes Abstract away from file formats, coding schemes
and user interfaces, providing a and user interfaces, providing a logical layerlogical layer for for annotation systems annotation systems
AGTK (Annotation Graph Toolkit): AGTK (Annotation Graph Toolkit): nodes encode time points, edges annotation nodes encode time points, edges annotation
labelslabels http://agtk.sourceforge.net/http://agtk.sourceforge.net/
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
AGTK: Discourse Annotation ToolAGTK: Discourse Annotation Tool
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Anvil - Anvil - Annotation of Video and Annotation of Video and Language Data Language Data
Michael Kipp (2001)Michael Kipp (2001) Java-based annotation tool for video filesJava-based annotation tool for video files Encoding of nonverbal behaviour (e.g. gesture)Encoding of nonverbal behaviour (e.g. gesture) Import annotations of speech related phenomena Import annotations of speech related phenomena
(e.g. dialogue acts) on multiple layers, (e.g. dialogue acts) on multiple layers, trackstracks Track definitions according to a specific annotation Track definitions according to a specific annotation
scheme in Anvil's generic track configurationscheme in Anvil's generic track configuration All data storage and exchange is in XML All data storage and exchange is in XML
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Anvil – screen shotAnvil – screen shot
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Multimodal AnnotationMultimodal Annotation
Multi-media corporaMulti-media corpora
Contain multi-media information where Contain multi-media information where various independent streams such as various independent streams such as speech, gesture, facial expression and eye speech, gesture, facial expression and eye movements are annotated and linkedmovements are annotated and linked
Hugely complex due to complicated time Hugely complex due to complicated time relationships between the annotationsrelationships between the annotations
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Annotation ChallengesAnnotation Challenges
Better understanding of natural communication Better understanding of natural communication modalities: human speech, gaze, gestures, facial modalities: human speech, gaze, gestures, facial expressions => how do different modalities expressions => how do different modalities support input disambiguationsupport input disambiguation
Behavioural issues: automaticity of human Behavioural issues: automaticity of human communication modescommunication modes
Multiparty communicationMultiparty communication Technical Challenges: Technical Challenges:
– SynchronisationSynchronisation– Error handlingError handling– Multimodal platforms, toolkits, architecturesMultimodal platforms, toolkits, architectures
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Annotation IssuesAnnotation Issues
Phenomena Phenomena – What is investigated: sounds, words, dialogue acts, coreference, new What is investigated: sounds, words, dialogue acts, coreference, new
information, correction, feedbackinformation, correction, feedback TheoryTheory
– How to label, what categoriesHow to label, what categories RepresentationRepresentation
– MarkupMarkup
It happened yesterday orthographic representation
<w>It</w><w>happened</w> <w>yesterday</w> XML representation
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
XML representationsXML representations eXtended Markup Language eXtended Markup Language Becoming a standard for data representationBecoming a standard for data representation
<word>happen</word><word>happen</word><word base=”happen”><word base=”happen”>
Distinction between elements and attributes:Distinction between elements and attributes:– <word> <base>happen</base> <pos>verb</pos> </word><word> <base>happen</base> <pos>verb</pos> </word>– <word base=”happen”> <pos>verb</pos><word base=”happen”> <pos>verb</pos>– <word base=”happen” pos=”verb”><word base=”happen” pos=”verb”>
XSL Stylesheet LanguageXSL Stylesheet Language XSLT Language to convert XML documents into another document in XSLT Language to convert XML documents into another document in
any form any form Does not support:Does not support:
– typed/grammar specification of attribute values typed/grammar specification of attribute values – inference models for element values shared by more than one elementinference models for element values shared by more than one element– applicability restriction of attributes that are mutual exclusiveapplicability restriction of attributes that are mutual exclusive
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Speech annotationSpeech annotationGibbon et al. (2000) Handbook of Multimodal and Spoken Dialogue Systems
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Spoken Dialogue AnnotationsSpoken Dialogue Annotations
Dialogue Acts (Communicative Acts)Dialogue Acts (Communicative Acts)– GCSL: acceptance, acknowledgement, agreement, GCSL: acceptance, acknowledgement, agreement,
answer, confirmation, question, request, etc.answer, confirmation, question, request, etc.– Interact:Interact:
FeedbackFeedback– strucutre, position, functionstrucutre, position, function
Turn managementTurn management– overlap (give attention, affirmation,reminder, excuse, overlap (give attention, affirmation,reminder, excuse,
hesitation, disagreement, lack of hearing)hesitation, disagreement, lack of hearing)– opening/closing an activityopening/closing an activity
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Interact tagsInteract tagsDialogue Act freq % Example
statement 527 23.5 Eiköhän se löydy sitten. I'm sure I'll find it
acknowledgement 389 17.4 Joo ok, right
question 237 10.6Ja kauanko sinne on ajoaika? And how long does it take to get there?
answer 213 9.5Se tulee noin 15 minuuttia tohon Oulunkylään.It'll take about 15 minutes to Oulunkylä.
confirmation 162 7.2 Suunnilleen joo. Approximately, yes.
opening 158 7.0 Mä oon X X hei. Hello, my name is X X.
check 123 5.5Eli kuudelt lähtee ensimmäiset. So the first ones depart at 6 o'clock
thanking 112 5.0 Kiitoksia paljon. Thanks a lot.
repetition 107 4.8 Kaheksan kolkyt kolme. At 8.33 a.m.
ending 100 4.5 Hei. Bye.
call_to_continue 45 2.0 Joo-o. Uh-huh.
wait 23 1.0 Katsotaan, hetkinen vaan. Let's see, just a minute.
correction 19 0.8Ei vaan se on edellinen se Uintikeskuksen pysäkki.No, the Uintikeskus stop is the previous one.
completion 10 0.4 ...kymmentä joo. ...ten, right.
request_to_repeat 10 0.4 Anteeks mitä? Sorry?
sigh 6 0.2 Voi kauhee. Oh dear.
Interact tags (Jokinen et al. 2001)
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Non-linguistic Vocalizations Non-linguistic Vocalizations CHRISTINE corpusCHRISTINE corpus
– Simple descriptions: belch, clearsThroat, cough, crying, giggle, Simple descriptions: belch, clearsThroat, cough, crying, giggle, humming, laugh, laughing, moan, onTelephone, panting, raspberry, humming, laugh, laughing, moan, onTelephone, panting, raspberry, scream, screaming, sigh, singing, sneeze, sniff, whistling, yawnscream, screaming, sigh, singing, sneeze, sniff, whistling, yawn
– More complex descriptions: imitates woman's voice, imitating a sexy More complex descriptions: imitates woman's voice, imitating a sexy woman's voice, imitating Chinese voice, imitating drunken voice, woman's voice, imitating Chinese voice, imitating drunken voice, imitating man's voice, imitating posh voice, mimicking police siren, imitating man's voice, imitating posh voice, mimicking police siren, mimicking Birmingham accent, mimicking Donald Duck, mimicking stupid mimicking Birmingham accent, mimicking Donald Duck, mimicking stupid man's voice, mimicking, speaking in French, spelling, whingeing, face-man's voice, mimicking, speaking in French, spelling, whingeing, face-slapping noise, drowning noises, imitates sound of something being slapping noise, drowning noises, imitates sound of something being unscrewed and popped off, imitates vomiting, makes drunken sounds unscrewed and popped off, imitates vomiting, makes drunken sounds and a pretend belch, makes running noises, sharp intake of breath, clickand a pretend belch, makes running noises, sharp intake of breath, click
– Non-vocal events: Non-vocal events: loud music and conversation, banging noise, break in loud music and conversation, banging noise, break in recording, car starts up, cat noises, children shouting, dog barks, poor recording, car starts up, cat noises, children shouting, dog barks, poor quality recording, traffic noise, loud music is on, microphone too far quality recording, traffic noise, loud music is on, microphone too far away, mouth full, telephone rings, beep, clapping, tapping on computer, away, mouth full, telephone rings, beep, clapping, tapping on computer, televisiontelevision
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Gesture Annotation 1Gesture Annotation 1
Different types:Different types:– iconic, pointing, emblematiciconic, pointing, emblematic
Different functions:Different functions:– make speech understanding easier make speech understanding easier – make speech production easiermake speech production easier– add semantic and discourse level informationadd semantic and discourse level information
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Gesture Annotation 2Gesture Annotation 2
What to annotateWhat to annotate– TimeTime– Movement encodingMovement encoding– Body parts involved (head, hand, fingers)Body parts involved (head, hand, fingers)– Static vs dynamic componentsStatic vs dynamic components– Direction, path shape, hand orientationDirection, path shape, hand orientation– Location w r t bodyLocation w r t body
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
LIMSI Coding Schema for MM LIMSI Coding Schema for MM Dialogues (Car Driver & Co-pilot)Dialogues (Car Driver & Co-pilot)
generalgeneral: v stands for verbal, g stands for gesture, c stands for human copilot, p : v stands for verbal, g stands for gesture, c stands for human copilot, p stands for human pilot, / and \ stands for begin and end of gesture, % stands for stands for human pilot, / and \ stands for begin and end of gesture, % stands for a comment written by the encoder, [ and ] are used for defining successive a comment written by the encoder, [ and ] are used for defining successive segments of the itinerary ({ and } code fsubparts of such segments)segments of the itinerary ({ and } code fsubparts of such segments)
timetime: < timecode-begin / timecode-end > : < timecode-begin / timecode-end > body partbody part: te=tête (head), ma=main (hand), mo=menton (chin), ms=mains (both : te=tête (head), ma=main (hand), mo=menton (chin), ms=mains (both
hands)hands) fingersfingers : ix=index (first finger), mj=majeur (middle finger), an=annulaire (ring : ix=index (first finger), mj=majeur (middle finger), an=annulaire (ring
finger), au=auriculaire (little finger), po=pouce (thumb)finger), au=auriculaire (little finger), po=pouce (thumb) gazegaze : oc= short glance on the map, ol= long glance on the map : oc= short glance on the map, ol= long glance on the map shape of the body partshape of the body part: td=tendu (tense), sp=souple (loose), cr=crochet (hook): td=tendu (tense), sp=souple (loose), cr=crochet (hook) global movementglobal movement: mv=mouvement ample (wide movement), r=mouvements : mv=mouvement ample (wide movement), r=mouvements
répétés (repeated movement), ( )=statiquerépétés (repeated movement), ( )=statique direction of movementdirection of movement: ar=arrière (backwards), tr=transversal (side), ci=circular: ar=arrière (backwards), tr=transversal (side), ci=circular meaning of gesturemeaning of gesture: ds=designation, ca= designation on the map, dr=direction, : ds=designation, ca= designation on the map, dr=direction,
dc=description, pc=position dc=description, pc=position
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
LIMSI Coding SchemaLIMSI Coding Schema
Example:Example:v(p): et maintenant? v(p): et maintenant? v(c): on va, non /là-bas je/ pense, tout droitv(c): on va, non /là-bas je/ pense, tout droitg(c): ixtddrg(c): ixtddr
graphic(copilot): index finger tense graphic(copilot): index finger tense directiondirection
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Gesture Coding Schemas 1Gesture Coding Schemas 1Dybkjaer et al (2002) ISLE/NIMM Survey on MM tools and resources
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Gesture Coding Schemas 2Gesture Coding Schemas 2Dybkjaer et al (2002) ISLE/NIMM Survey on MM tools and resources
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Facial Action Coding (FACS)Facial Action Coding (FACS)
P. Ekman & W. Friesen (1976)P. Ekman & W. Friesen (1976) describes visible facial movementsdescribes visible facial movements anatomically basedanatomically based Action Unit (AU): action produced by one Action Unit (AU): action produced by one
muscle or group of related musclesmuscle or group of related muscles any expression described as a set of AUsany expression described as a set of AUs 46 AUs defined46 AUs defined
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
AUs for raising eye-browsAUs for raising eye-browsDybkjaer et al (2002) ISLE/NIMM Survey of Annotation Schemes and Identification of Best Practise
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
Alphabet of the eyesAlphabet of the eyes
I. Poggi, N. Pezzato, C. PelachaudI. Poggi, N. Pezzato, C. Pelachaud Gaze annotationGaze annotation
– eyebrow movements, eyelid openness, eyebrow movements, eyelid openness, wrinkles, eye direction, eye reddening, humiditywrinkles, eye direction, eye reddening, humidity
E.g. eyebrows:E.g. eyebrows:– right/left: Internal: up / down right/left: Internal: up / down
Central: up / down Central: up / down
External: up / downExternal: up / down
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
ConclusionsConclusions
Need for corpora annotated with multimodal informationNeed for corpora annotated with multimodal information Much to do in coding MM information in all forms, relevant Much to do in coding MM information in all forms, relevant
level of detail, cross-level & cross-modalitylevel of detail, cross-level & cross-modality No general coding schemasNo general coding schemas
– coding schemas for different aspects of facial expression, task-coding schemas for different aspects of facial expression, task-dependent gestures etcdependent gestures etc
– No cross-modality coding schemasNo cross-modality coding schemas Lack of theoretical formalisationLack of theoretical formalisation
– how the face expresses cognitive propertieshow the face expresses cognitive properties– how gestures are used (except for sign language)how gestures are used (except for sign language)– how they are coordinated with speechhow they are coordinated with speech
No general annotation toolsNo general annotation tools
22 August 200222 August 2002 NordTalk NorFA Course "Using Spoken Language Corpora"NordTalk NorFA Course "Using Spoken Language Corpora"
ReferencesReferences Bernsen, N. O., Dybkjær, L. and Kolodnytsky, M.: THE NITE WORKBENCH - A Tool for Bernsen, N. O., Dybkjær, L. and Kolodnytsky, M.: THE NITE WORKBENCH - A Tool for
Annotation of Natural Interactivity and Multimodal Data. Proceedings of the Third Annotation of Natural Interactivity and Multimodal Data. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC'2002), Las International Conference on Language Resources and Evaluation (LREC'2002), Las Palmas, May 2002.Palmas, May 2002.
Bird, S. and M. Liberman. A formal framework for linguistic annotation. Speech Bird, S. and M. Liberman. A formal framework for linguistic annotation. Speech Communication, 33(1,2):23-60, 2001.Communication, 33(1,2):23-60, 2001.
Dybkjaer et al (2002). Dybkjaer et al (2002). ISLE/NIMM reportsISLE/NIMM reports. . http://isle.nis.sdu.dk/reports/wp11/ http://isle.nis.sdu.dk/reports/wp11/ Gibbon, D., Mertins I. and R. Moore (eds.) Gibbon, D., Mertins I. and R. Moore (eds.) Handbook of multimodal and spoken dialogue Handbook of multimodal and spoken dialogue
systems. Resources, Terminology, and Product Evaluationsystems. Resources, Terminology, and Product Evaluation . Kluwer, 2000.. Kluwer, 2000. Granström, B. (ed.) Granström, B. (ed.) Multimodality in Language and Speech SystemsMultimodality in Language and Speech Systems. Dordrecht: Kluwer . Dordrecht: Kluwer
2002.2002. Kipp, M. Anvil - A Generic Annotation Tool for Multimodal Dialogue. Proceedings of Kipp, M. Anvil - A Generic Annotation Tool for Multimodal Dialogue. Proceedings of
Eurospeech 2001, pp. 1367-1370, Aalborg, September 2001.Eurospeech 2001, pp. 1367-1370, Aalborg, September 2001. Maybury, M. T. and W. Wahlster (1998). Maybury, M. T. and W. Wahlster (1998). Readings in Intelligent User InterfacesReadings in Intelligent User Interfaces. San . San
Francisco, CA, Morgan KaufmannFrancisco, CA, Morgan Kaufmann Muller, C. and M. Strube. MMAX: Atool for the annotation of multi-modal corpora. In Muller, C. and M. Strube. MMAX: Atool for the annotation of multi-modal corpora. In
Proceedings of 2nd IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Proceedings of 2nd IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Seattle, Washington, pp. 4550, 2001.Systems, Seattle, Washington, pp. 4550, 2001.
Wahlster, W (ed). Wahlster, W (ed). Dagstuhl seminar on Multimodality Dagstuhl seminar on Multimodality http://www.dfki.de/~wahlster/Dagstuhl_Multi_Modality/ http://www.dfki.de/~wahlster/Dagstuhl_Multi_Modality/