kinect=imu? learning mimo signal mappings to automatically translate activity recognition systems...
DESCRIPTION
We propose a method to automatically translate a preexisting activity recognition system, devised for a source sensor domain S, so that it can operate on a newly discovered target sensor domain T, possibly of different modality. First, we use MIMO system identification techniques to obtain a function that maps the signals of S to T. This mapping is then used to translate the recognition system across the sensor domains. We demonstrate the approach in a 5-class gesture recognition problem translating between a vision-based skeleton tracking system (Kinect), and inertial measurement units (IMUs). An adequate mapping can be learned in as few as a single gesture (3 seconds) in this scenario. The accuracy after Kinect --> IMU or IMU --> Kinect translation is 4% below the baseline for the same limb. Translating across modalities and also to an adjacent limb yields an accuracy 8% below baseline. We discuss the sources of errors and means for improvement. The approach is independent of the sensor modalities. It supports multimodal activity recognition and more flexible real-world activity recognition system deployments. This presentation illustrates part of the work described in the following article: * Banos, O., Calatroni, A., Damas, M., Pomares, H., Rojas, I., Troester, G., Sagha, H., Millan, J. del R., Chavarriaga, R., Roggen, D.: Kinect=IMU? Learning MIMO Signal Mappings to Automatically Translate Activity Recognition Systems Across Sensor Modalities. In: Proceedings of the 16th annual International Symposium on Wearable Computers (ISWC 2012), Newcastle, United Kingdom, June 18-22 (2012)TRANSCRIPT
Kinect=IMU? Learning MIMO Signal Mappings to Automatically Translate Activity
Recognition Systems Across Sensor Modalities
ISWC 2012, Newcastle (UK)
Oresti Baños1, Alberto Calatroni2, Miguel Damas1, Héctor Pomares1,
Ignacio Rojas1, Hesam Sagha3, José del R. Millán3,
Gerhard Tröster2, Ricardo Chavarriaga3, and Daniel Roggen2 1Department of Computer Architecture and Computer Technology, CITIC-UGR, University of Granada, SPAIN
2Wearable Computing Laboratory, ETH Zurich, SWITZERLAND 3CNBI, Center for Neuroprosthetics, École Polytechnique Fédérale de Lausanne, SWITZERLAND
FET-Open Grant #225938
Problem statement
• Scenario
Problem statement
• Scenario
Problem statement
• Scenario
Problem statement
• Scenario
Problem statement
• Scenario
Problem statement
• Scenario
Problem statement
• Scenario
Problem statement
• Scenario
Problem statement
• Scenario
Problem statement
• Scenario
Problem statement
• Scenario
Transfer learning in AR
• Concept of transfer learning – Origin in ML: “Need for lifelong machine learning methods that retain and reuse
previously learned knowledge” NIPS-95 workshop on “Learning to Learn”
– Mechanism, ability or means to recognize and apply knowledge and skills learned in previous tasks or domains to novel tasks or domains
• Intended for – Continuity of context-awareness across different sensing environments – Network topology redundancy – Collective and individual knowledge enhancement
• Advantages – Knowledge may be conserved – Less labeled supervision is needed (ideally no additional recordings) – ‘Online’ process – Possibly heterogeneous
Transfer learning in AR: related work
• Selected contributions
– On-body sensors ::: Calatroni et al. (2011)
• Model parameters
• Labels
– Ambient sensors ::: van Kasteren et al. (2010)
• Common meta-feature space
• Limitations
– Long time scales operation
– Possible incomplete transfer
– Difficult transfer across modalities
A. Calatroni, D. Roggen, and G. Tröster, “Automatic transfer of activity recognition capabilities between body-worn motion sensors: Training newcomers to recognize locomotion,” in Proc. 8th Int Conf on Networked Sensing Systems, 2011. T. van Kasteren, G. Englebienne, and B. Kröse, “Transferring knowledge of activity recognition across sensor networks,” in Proc. 8th Int. Conf on Pervasive Computing, 2010, pp. 283–300.
Translation setup (Kinect ↔ IMU)
Skeleton Tracking System
(Kinect)
Body-worn Inertial Measurement Unit (Xsens)
Translation setup (Kinect ↔ IMU)
Skeleton Tracking System
(Kinect)
– RGB camera, IR LED, IR camera
– Depth map
– 15 joint skeleton
– 3D joint coordinates (POS in mm)
– Tracking range: 1.2-3.5m
Body-worn Inertial Measurement Unit (Xsens)
– Accurate 3D orientation
– Several modalities (ACC, GYR, MAG)
Translation setup (Kinect ↔ IMU)
Kinect (Position)
IMU (Acceleration)
Translation setup (Kinect ↔ IMU)
Kinect (Position)
IMU (Acceleration)
IMU (Acceleration)
Translation setup (Kinect ↔ IMU)
Kinect (Position)
IMU (Acceleration)
Translation setup (Kinect ↔ IMU)
Kinect (Position)
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-0.5
0
0.5
1
1.5
Time (s)
Accele
ration (
G)
XYZ
Translation method
• System identification (signal level)
• Translation architectures (classification level)
– Template translation
– Signal translation
IMU (Acceleration)
Translation: Kinect to IMU
Kinect (Position)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
L1 L2 L3 0 2 4 6
-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
IMU (Acceleration)
Kinect to IMU (signal mapping)
Kinect (Position)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
L1 L2 L3 0 2 4 6
-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
Coexistence… (T)
0 20 40-1
0
1
2
Time (s)
Positio
n (
m)
0 20 40-1
0
1
2
Time (s)A
ccele
ration (
G)
IMU (Acceleration)
Kinect to IMU (signal mapping)
Kinect (Position)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
L1 L2 L3 0 2 4 6
-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋𝑇(𝑡)
Signal mapping
• Linear MIMO mapping
– Definition
• Ψ𝑆→𝑇 𝑡 ∝ 𝐵(𝑙) → 𝑋 𝑇 𝑡 = 𝐵(𝑙)𝑋𝑆(𝑡)
• 𝐵 𝑙 =
𝑏11(𝑙) 𝑏12(𝑙) ⋯ 𝑏1𝑀(𝑙)
𝑏21(𝑙) 𝑏22(𝑙) ⋯ 𝑏2𝑀(𝑙)⋮
𝑏𝑁1(𝑙)⋮
𝑏𝑁2(𝑙)⋮⋯
⋮𝑏𝑁𝑀(𝑙)
𝑏𝑖𝑘 𝑙 = 𝑏𝑖𝑘(0)
𝑙−𝑠𝑖𝑘 + 𝑏𝑖𝑘(1)
𝑙−𝑠𝑖𝑘−1 +⋯+ 𝑏𝑖𝑘(𝑞)
𝑙−𝑠𝑖𝑘−𝑞 𝑙−𝑝𝑥 𝑡 = 𝑥(𝑡 − 𝑝)
– Transformations modeling:
• Scaling 𝑏𝑖𝑘(𝑟)
= 𝐾𝑖𝑘 , 𝑟 = 0 ∧ 𝑖 = 𝑗 0, 𝑟 > 0
• Rotation 𝑏𝑖𝑘(𝑟)
= 𝑅𝑖𝑘, 𝑟 = 0 0, 𝑟 > 0
• Differentiation of order h 𝑏𝑖𝑘(𝑟)
= 𝐻𝑖𝑘
(𝑟), 𝑟 ≤ ℎ
0, 𝑟 > ℎ
Coefficients of the polynomial obtained by means of a LS method
IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
L1 L2 L3 0 2 4 6
-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋𝑇(𝑡)
IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
L1 L2 L3 0 2 4 6
-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋𝑇(𝑡)
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
L1 L2 L3 0 2 4 6
-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋𝑇(𝑡)
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-0.5
0
0.5
1
1.5
Time (s)
Accele
ration (
G)
X
Y
Z
IMU (Acceleration)
Translation method (Kinect IMU)
Kinect (Position)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
L1 L2 L3
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋𝑇(𝑡) 0 1 2 3
0
0.5
1
1.5
Time (s)
Accele
ration (
G)
XYZ
X
Y
Z
IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
L1 L2 L3 0 2 4 6
-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
0 2 4 6-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 2 4-1
0
1
2
Time (s)
Positio
n (
m)
XYZ
0 1 2 3-1
0
1
2
Time (s)
Positio
n (
m)
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋𝑇(𝑡)
0 2 4-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 5 10-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 2 4-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
L1 L2 L3 0 2 4-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 5 10-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 2 4-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 2 4-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 5 10-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 2 4-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
L1 L2 L3 0 2 4
-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 5 10-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 2 4-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
L1 L2 L3 0 2 4-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 5 10-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 2 4-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 2 4-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 5 10-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
0 2 4-1
0
1
2
Time (s)
Accele
ration (
G)
X
Y
Z
IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
L1 L2 L3
Kinect (Position)
IMU to Kinect
IMU (Acceleration)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
Kinect (Position)
IMU to Kinect (signal mapping)
IMU (Acceleration)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
Coexistence… (T)
0 20 40-1
0
1
2
Time (s)P
ositio
n (
m)
0 20 40-1
0
1
2
Time (s)
Accele
ration (
G)
Kinect (Position)
IMU to Kinect (signal mapping)
IMU (Acceleration)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
Ψ𝑇→𝑆 𝑡 : 𝑋𝑇(𝑡) → 𝑋 𝑆(𝑡) ≈ 𝑋𝑆(𝑡)
Kinect (Position)
IMU to Kinect (signal translation)
IMU (Acceleration)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
Ψ𝑇→𝑆 𝑡 : 𝑋𝑇(𝑡) → 𝑋 𝑆(𝑡) ≈ 𝑋𝑆(𝑡)
Kinect (Position)
IMU to Kinect (signal translation)
IMU (Acceleration)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
Ψ𝑇→𝑆 𝑡 : 𝑋𝑇(𝑡) → 𝑋 𝑆(𝑡) ≈ 𝑋𝑆(𝑡)
Kinect (Position)
IMU to Kinect (signal translation)
IMU (Acceleration)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
Ψ𝑇→𝑆 𝑡 : 𝑋𝑇(𝑡) → 𝑋 𝑆(𝑡) ≈ 𝑋𝑆(𝑡)
0 1 2 3-1
0
1
2
Time (s)P
ositio
n (
m)
XYZ
Kinect (Position)
IMU to Kinect (signal translation)
IMU (Acceleration)
𝑋𝑆(𝑡)
𝑋𝑇(𝑡)
System S (source domain) System T (target domain)
Sig
na
l le
vel
Cla
ssif
ica
tio
n
leve
l
Ψ𝑇→𝑆 𝑡 : 𝑋𝑇(𝑡) → 𝑋 𝑆(𝑡) ≈ 𝑋𝑆(𝑡)
0 1 2 3-0.5
0
0.5
1
1.5
Time (s)A
ccele
ration (
G)
X
Y
Z
Experimental setup
Kinect http://code.google.com/p/qtkinectwrapper/ Xsens http://crnt.sourceforge.net/CRN_Toolbox/References.html
Dataset
• Two scenarios
Geometric Gestures (HCI) Idle (Background)
~5 min of data 5 gestures, 48 instances per gesture
Evaluation
• Analyzed transfers
– Kinect (position):
• HAND
– IMUs (acceleration):
• RIGHT LOWER ARM (RLA)
• RIGHT UPPER ARM (RUA)
• BACK
Evaluation
• Model
– MIMO mapping with 10 tap delay
• Mapping domains
– Problem-domain mapping (PDM)
– Gesture-specific mapping (GSM)
– Unrelated-domain mapping (UDM)
• Results
– Mapping learning: 100 samples (~3.3s)
– Mapping testing: rest of unused instances
– Selection randomly repeated 20 times in an outer CV process
Translation accuracy
• Model
– 3-NN, FS = max. & min.
– 5-fold cross validation
– 100 repetitions
• Results
To RLA To RUA To BACK From RLA From RUA From BACK0
20
40
60
80
100
Accura
cy (
%)
BS BT PDM GSM UDM
From Kinect … … to Kinect
Translation accuracy
• Model
– 3-NN, FS1 = mean, FS2 = max. & min.
– 5-fold cross validation
– 100 repetitions
• Results (UDM)
100 200 500 1k 2k 4k 9k#Samples
FS1BS
FS1BT
FS1T
FS2BS
FS2BT
FS2T
100 200 500 1k 2k 4k 9k
50
60
70
80
90
100
#Samples
Accura
cy (
%)
From Kinect to IMU (RLA) From IMU (RLA) to Kinect
Encountered limitations
• General model challenges/limitations
– Not all the mappings might be allowed (Temperature Gyro?)
• Kinect ↔ IMU challenges/limitations
– Different frame of reference (IMU local vs. Kinect world)
– Occlusions
– Subject out of range
– Torsions
Conclusions and future work
• Transfer system based on
– MIMO mapping model
– Template/Signal translation
• MAPPING: as few as a single gesture (~3 seconds)
• Successful translation across sensor modalities, Kinect ↔ IMU (4% and 8% below baseline)
• NEXT STEPS
– Analyze the effect of data loss (occlusions, anomalies, etc.)
– Higher characterization of the considered MIMO model (i.e., ‘q’ value)
– Alternative mapping models: ARMA, TDNN, LSSVM
– Combination of sensors (homogeneous/heterogeneous)
– Test in more complex setups/real-world situations
Thank you for your attention. Questions?
Oresti Baños Legrán Dep. Computer Architecture & Computer Technology
Faculty of Computer & Electrical Engineering (ETSIIT) University of Granada, Granada (SPAIN)
Email: [email protected] Phone: +34 958 241 516 Fax: +34 958 248 993
Work supported in part by the FP7 project OPPORTUNITY under FET-Open grant number 225938, the Spanish CICYT Project TIN2007-60587, Junta de Andalucia Projects P07-TIC-02768 and P07-TIC-02906, the CENIT project AmIVital and the FPU Spanish grant AP2009-2244