perception-link behavior model€¦ · [1] h. lee et al, “convolutional deep belief networks for...
TRANSCRIPT
PERCEPTION-LINK BEHAVIOR MODEL:
REVISIT ENCODER & DECODER IMI PHD Presentation
Presenter: William Gu Yuanlong (PhD student)
Supervisor: Assoc. Prof. Gerald Seet Gim Lee
Co-Supervisor: Prof. Nadia Magnenat-Thalmann
CONTENT
• Introduction
• Summary of reviewed interface
• Overview of the proposed framework
• Encoder and Decoder
• Conclusion
• Future work
Telepresence
(Sense of being there) vs “Tele” social presence (Sense of being together) [1]
Reference [1] F. Biocca et al., “The networked minds measure of social presence: Pilot test of the factor structure and concurrent validity,” in International Workshop on Presence, 2001.
2 of 15
COMMUNICATION MEDIUMS
Reference [1] E. Paulos, “Personal Tele-Embodiment,” University of California at Berkeley, 2002. [2] K. M. Tsui et al, “Towards Measuring the Quality of Interaction: Communication through Telepresence Robots,” in Performance Metrics for Intelligent Systems Workshop, 2012.
• Distance Telecommunication • Essential tools
• Advantage • Improves productivity
• Eases constrain on resources
• Face to face communication • Golden standard
• How you say it is more important than what you say
• Advantage • More social richness
3 of 15
MOTIVATION
Reference [1] E. Paulos, “Personal Tele-Embodiment,” University of California at Berkeley, 2002. [2] C. Breazeal, “MeBot : A robotic platform for socially embodied telepresence,” in The 5th ACM/IEEE international conference on Human-robot interaction, 2010. [3] K. Hasegawa and Y. Nakauchi, “Preliminary Evaluation of a Telepresence Robot Conveying Pre-motions for Avoiding Speech Collisions,” in hai-conference.net, 2013.
Anthropomorphism in term of appearance and functionality
De
gre
e o
f so
cia
l p
rese
nc
e
MeBot [2]
Hasegawa’s
Bot[3]
PRoP[1]
EDGAR
Face to Face
EDGAR
• Wider range of nonverbal cues; less
certain postures
• Life-sized system
• Rear projection robotic head for
realistic face display
Commercial • Limited nonverbal cues • Semi-autonomous behavior
Existing academic TPR • Wider range of nonverbal cues • Smaller systems (Mebot and Hasegawa) • Control systems contradict each other
• Passive model controller • Natural Interface
- Improve the existing telepresence robot in term of social presence.
- Two aspect of the works were explored 1) Physical appearance (EDGAR)
2) Operator’s interface (PLB)
4 of 15
SUMMARY: REVIEW OF THE OPERATOR’S INTERFACE
Reference [1] C. Breazeal, “MeBot : A robotic platform for socially embodied telepresence,” in The 5th ACM/IEEE international conference on Human-robot interaction, 2010. [2] K. Hasegawa and Y. Nakauchi, “Preliminary Evaluation of a Telepresence Robot Conveying Pre-motions for Avoiding Speech Collisions,” in hai-conference.net, 2013. [3] H. Park, E. Kim, S. Jang, and S. Park, “HMM-based gesture recognition for robot control,” in Pattern recognition and Image Analysis, 2005, pp. 607–614. [4] J. M. Susskind et al., “Generating Facial Expressions with Deep Belief Nets,” in Affective Computing, Emotion Modeling, Synthesis and Recognition, 2008.
5 of 15
GENERAL FRAMEWORK
• Perception-link behavior system integration
• Encodes various features into their styles • Convolution Neural Network with Restricted Boltzmann machine and Sample
Pooling [1]
• Associates style of various features, both operator and interactants • FUSION adaptive resonance theory [2]
• Decodes the current state based on the style and the previous state. • Factored gated restricted Boltzmann machine [3]
Natural interface
Reference
[1] H. Lee et al, “Convolutional deep belief networks for
scalable unsupervised learning of hierarchical
representations,” in Proceedings of the 26th Annual
International Conference on Machine Learning, 2009.
[2] A. Tan et al., “Intelligence through interaction:
Towards a unified theory for learning,” in Advances in
Neural Networks, 2007.
[3] R. Memisevic and G. E. Hinton, “Learning to
represent spatial transformations with factored higher-
order Boltzmann machines.,” Neural computation,
2010.
A novel flexible model that exhibit expressive nonverbal cues without
compromising safety and operator cognitive load.
6 of 15
REVISITING ENCODER • Revisited gestures
encoder
• Additional database
• Compared various unsupervised method
• BOW – Kmean
• BOW – GMM
• CNN-RBM-Max
• Evaluated via intra and inter cluster distance between known label. 𝒊𝑡 𝒊𝑡−1
… …
𝒊𝑡−𝑇+1 𝒊𝑡−𝑘 𝒊𝑡−𝑘−𝑐+1
…
Convoluted window of size c
𝒉(𝑇−𝑐+1) 𝒉(0) 𝒉(𝑘)
… …
1
…
N
… …
𝒊∞ 𝒊0
n
…
1
N
n
1
N
n
𝒉𝑡
1
N
n
𝒉∞
1
N
n
1
N
n
𝒉𝑇
… …
𝒉 = 𝑓(𝒊1:𝑐;𝑾, 𝒃)
𝒉 = 𝑚𝑎𝑥(𝒉0:(𝑇−𝑐+1))
Labeled encoded signal
Window of size T
Convoluted weight
Convoluted Neural Network via Restricted Boltzmann Machine
and Max pooling
7 of 15
DECODER FOR GESTURES
• Two main considerations
• Capability to generate different gestures given any encoded signal.
• Capability to generate similar variations of gestures if encoded signals are close to each others.
8 of 15
Basic concept behind encoding and decoding signals
One of the possible applications: Collision preventions
FRBM MODEL
• Factored Gated Restricted Boltzmann Machine
• Bottom up to estimate the 𝒉𝑡 given 𝒊(𝑡−1): 𝑡−𝑇+1 and 𝒛𝑡
• Top down to infer 𝒊𝑡
𝒊𝑡−1
…
𝒊𝑡−𝑇+1 𝒊𝑡
…
…
𝑾2
𝑾1
𝑾3
…
𝑹 𝒉𝑡 = 𝑓(𝑾1 ∙ 𝒊𝑡: 𝑡−𝑇+1 ∘ [𝑾3 ∙ 𝑹 ∙ 𝒛𝑡 ];𝑾2, 𝒃)
𝒊𝑡: 𝑡−𝑇+1 = 𝑔 𝑾2′ ∙ 𝒉𝑡 ∘ 𝑾3 ∙ 𝑹 ∙ 𝒛𝑡 ;𝑾1, 𝒂
𝒛𝑡
𝒉𝑡
Gate
9 of 15
G1 G2 G3 G4 G5
Sid
e
Fro
nt
Fro
nt
Top
GESTURES GENERATION @ DIFFERENT LABELS
Frame index (15Hz)
Inte
nsi
ty o
f fe
atu
res
#18 (
No
rma
lize
d)
Input: Z (encoded signals) Output: Gestures
(Animation is looped)
10 of 15
Given a specific encoded signals (top), a unique gesture(right) can be reconstructed
Number of features in Z
Inte
nsi
ty o
f e
ac
h f
ea
ture
in
Z
Number of features in Z
Inte
nsi
ty o
f e
ac
h f
ea
ture
in
Z
GESTURES GENERATION @ A LABEL’S PROXIMITY
N1
Sid
e
F
ron
t
Top
N2 N3 Original
(Animation is looped)
11 of 15
Given a set of encoded signals with similar intensity(top), a set of gesture(right) with similar
trait can be reconstructed.
Input: Z (encoded signals) Output: Gestures
CONCLUSION • Capability to generate different gestures
given a specific set of encoded signal.
• Capability to generate similar variations of gestures given three similar encoded signals.
• Future Challenges for decoder • A evaluation method to prove the
correctness of the decoded signals. • A set of new features to encode and
decode the frequencies characteristic. • A cheap and real-time method to
explore non-collision encoded signals
12 of 15
Encoding
Decoding Ideal
Reality
FUTURE WORK
• Associator • Adaptive Resonance Theory
Euclidean
• Encoder for the face • Currently, the current model works
on CK++ data base (frontal only)
Facial identity and expression
Ge
stu
res/
Po
stu
res
Identities
Expression
13 of 15
PCA1
PC
A2
PC
A3
QUESTION AND ANSWER