guiding interaction behaviors for multi-modal grounded ...behavior annotations. we gather relevant...
TRANSCRIPT
Guiding Interaction Behaviors forMulti-modal Grounded Language Learning
Jesse Thomason, Jivko Sinapov, Raymond J. MooneyUniversity of Texas at Austin
,
Multi-Modal Grounded Linguistic Semantics
Robots need to be able to connect language to theirenvironment in order to discuss real world objects withhumans. Grounded language learning connects arobot’s sensory perception to natural language pred-icates. We consider a corpus of objects explored by arobot in previous work (Sinapov, IJCAI 2016).
grasp lift lower
drop press pushA hold and look behavior were also performed foreach object.I Objects sparsely annotated with languagepredicates from an interactive “I Spy” game withhumans (Thomason, IJCAI 2016).
I Many object examples are necessary to groundlanguage in multi-modal space
I We explore additional sources of information fromusers and corporaI Behavior annotations, “What behavior would you use tounderstand ‘red’?”
I Modality annotations, “How do you experience ‘heaviness’?”I Word embeddings, exploiting that “thin” is close to “narrow.”
Methodology
The decision d(p,o) ∈ [−1,1] for predicate p and ob-ject o is
d(p,o) =∑c∈C
κp,cGp,c(o), (1)
for Gp,c a linear SVM trained on labeled objects for p inthe feature space of context c with Cohen’s κp,c agree-ment with human labels during cross-validation.
“round”
grasp lift look
audio audio color
haptics haptics shape haptics
lower
grasp lift look
audio audio color
haptics haptics shape haptics
lower
yes
no
Data from Human Interactions
Human Behavior Annotations
Estimated Context Reliability
+Behavior Annotations
audio
audio
Behavior Annotations. We gather relevant behaviorsfrom human annotators for each predicate, allowing usto augment classifier confidences with human sugges-tions. The predicate “heavy” favors interaction behav-iors like lifting and holding.
Modality Annotations. We also derive a modality an-notations past work (Lynott, 2009), allowing us to aug-ment classifier confidences with human intuitions. Thepredicate “white” favors visual-related modalities.
Word Embeddings. We calculate positive similarityas and subsequently get augment each predicate’sclassifier confidences with similarity-weighted aver-ages of its predicate neighbors. A predicate like “nar-row” with few object examples can borrow informationfrom a predicate like “thin” that is close in word embed-ding space and has many examples.
Experiment and Results
p r f1mc .282 .355 .311κ .406 .460 .422B+κ .489 .489 .465M+κ .414 .466 .430W+κ .373 .474 .412
I Adding behavior annotations or modality annotationsimproves performance over using kappa confidencealone.
I Behavior annotations helped the f -measure ofpredicates like “pink”, “green”, and “half-full”
I Modality annotations helped with predicates like“round”, “white”, and “empty”.
I Sharing kappa confidences across similarpredicates based on their embedding cosinesimilarity improves recall at the cost of precision.
I For example, “round” improved, but at the expense ofdomain-specific meanings of predicates like “water”.
[email protected] https://www.cs.utexas.edu/˜jesse/