kinect=imu? learning mimo signal mappings to automatically translate activity recognition systems...

Kinect=IMU? Learning MIMO Signal Mappings to Automatically Translate Activity

Recognition Systems Across Sensor Modalities

ISWC 2012, Newcastle (UK)

Oresti Baños1, Alberto Calatroni2, Miguel Damas1, Héctor Pomares1,

Ignacio Rojas1, Hesam Sagha3, José del R. Millán3,

Gerhard Tröster2, Ricardo Chavarriaga3, and Daniel Roggen2 1Department of Computer Architecture and Computer Technology, CITIC-UGR, University of Granada, SPAIN

2Wearable Computing Laboratory, ETH Zurich, SWITZERLAND 3CNBI, Center for Neuroprosthetics, École Polytechnique Fédérale de Lausanne, SWITZERLAND

FET-Open Grant #225938

Problem statement

• Scenario

Transfer learning in AR

• Concept of transfer learning – Origin in ML: “Need for lifelong machine learning methods that retain and reuse

previously learned knowledge” NIPS-95 workshop on “Learning to Learn”

– Mechanism, ability or means to recognize and apply knowledge and skills learned in previous tasks or domains to novel tasks or domains

• Intended for – Continuity of context-awareness across different sensing environments – Network topology redundancy – Collective and individual knowledge enhancement

• Advantages – Knowledge may be conserved – Less labeled supervision is needed (ideally no additional recordings) – ‘Online’ process – Possibly heterogeneous

Transfer learning in AR: related work

• Selected contributions

– On-body sensors ::: Calatroni et al. (2011)

• Model parameters

• Labels

– Ambient sensors ::: van Kasteren et al. (2010)

• Common meta-feature space

• Limitations

– Long time scales operation

– Possible incomplete transfer

– Difficult transfer across modalities

A. Calatroni, D. Roggen, and G. Tröster, “Automatic transfer of activity recognition capabilities between body-worn motion sensors: Training newcomers to recognize locomotion,” in Proc. 8th Int Conf on Networked Sensing Systems, 2011. T. van Kasteren, G. Englebienne, and B. Kröse, “Transferring knowledge of activity recognition across sensor networks,” in Proc. 8th Int. Conf on Pervasive Computing, 2010, pp. 283–300.

Translation setup (Kinect ↔ IMU)

Skeleton Tracking System

(Kinect)

Body-worn Inertial Measurement Unit (Xsens)


Skeleton Tracking System

(Kinect)

– RGB camera, IR LED, IR camera

– Depth map

– 15 joint skeleton

– 3D joint coordinates (POS in mm)

– Tracking range: 1.2-3.5m

Body-worn Inertial Measurement Unit (Xsens)

– Accurate 3D orientation

– Several modalities (ACC, GYR, MAG)


Kinect (Position)

IMU (Acceleration)

IMU (Acceleration)


Kinect (Position)

IMU (Acceleration)


Kinect (Position)

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-0.5

0

0.5

1

1.5

Time (s)

Accele

ration (

G)

XYZ

Translation method

• System identification (signal level)

• Translation architectures (classification level)

– Template translation

– Signal translation

IMU (Acceleration)

Translation: Kinect to IMU

Kinect (Position)

𝑋𝑆(𝑡)

𝑋𝑇(𝑡)

System S (source domain) System T (target domain)

Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

L1 L2 L3 0 2 4 6

-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

IMU (Acceleration)

Kinect to IMU (signal mapping)

Kinect (Position)

𝑋𝑆(𝑡)

𝑋𝑇(𝑡)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

L1 L2 L3 0 2 4 6

-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

Coexistence… (T)

0 20 40-1

0

1

2

Time (s)

Positio

n (

m)

0 20 40-1

0

1

2

Time (s)A

ccele

ration (

G)

IMU (Acceleration)

Kinect to IMU (signal mapping)

Kinect (Position)

𝑋𝑆(𝑡)

𝑋𝑇(𝑡)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

L1 L2 L3 0 2 4 6

-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋𝑇(𝑡)

Signal mapping

• Linear MIMO mapping

– Definition

• Ψ𝑆→𝑇 𝑡 ∝ 𝐵(𝑙) → 𝑋 𝑇 𝑡 = 𝐵(𝑙)𝑋𝑆(𝑡)

• 𝐵 𝑙 =

𝑏11(𝑙) 𝑏12(𝑙) ⋯ 𝑏1𝑀(𝑙)

𝑏21(𝑙) 𝑏22(𝑙) ⋯ 𝑏2𝑀(𝑙)⋮

𝑏𝑁1(𝑙)⋮

𝑏𝑁2(𝑙)⋮⋯

⋮𝑏𝑁𝑀(𝑙)

𝑏𝑖𝑘 𝑙 = 𝑏𝑖𝑘(0)

𝑙−𝑠𝑖𝑘 + 𝑏𝑖𝑘(1)

𝑙−𝑠𝑖𝑘−1 +⋯+ 𝑏𝑖𝑘(𝑞)

𝑙−𝑠𝑖𝑘−𝑞 𝑙−𝑝𝑥 𝑡 = 𝑥(𝑡 − 𝑝)

– Transformations modeling:

• Scaling 𝑏𝑖𝑘(𝑟)

= 𝐾𝑖𝑘 , 𝑟 = 0 ∧ 𝑖 = 𝑗 0, 𝑟 > 0

• Rotation 𝑏𝑖𝑘(𝑟)

= 𝑅𝑖𝑘, 𝑟 = 0 0, 𝑟 > 0

• Differentiation of order h 𝑏𝑖𝑘(𝑟)

= 𝐻𝑖𝑘

(𝑟), 𝑟 ≤ ℎ

0, 𝑟 > ℎ

Coefficients of the polynomial obtained by means of a LS method

IMU (Acceleration)

Kinect to IMU (template translation)

Kinect (Position)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

L1 L2 L3 0 2 4 6

-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)


IMU (Acceleration)


Kinect (Position)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

L1 L2 L3 0 2 4 6

-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)


0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

IMU (Acceleration)


Kinect (Position)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

L1 L2 L3 0 2 4 6

-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)


0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-0.5

0

0.5

1

1.5

Time (s)

Accele

ration (

G)

X

Y

Z

IMU (Acceleration)

Translation method (Kinect IMU)

Kinect (Position)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

L1 L2 L3

Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋𝑇(𝑡) 0 1 2 3

0

0.5

1

1.5

Time (s)

Accele

ration (

G)

XYZ

X

Y

Z

IMU (Acceleration)


Kinect (Position)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

L1 L2 L3 0 2 4 6

-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)

0 2 4 6-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 2 4-1

0

1

2

Time (s)

Positio

n (

m)

XYZ

0 1 2 3-1

0

1

2

Time (s)

Positio

n (

m)


0 2 4-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 5 10-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 2 4-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

L1 L2 L3 0 2 4-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 5 10-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 2 4-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 2 4-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 5 10-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 2 4-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

IMU (Acceleration)


Kinect (Position)

𝑋𝑆(𝑡)

𝑋𝑇(𝑡)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

L1 L2 L3 0 2 4

-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 5 10-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 2 4-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

L1 L2 L3 0 2 4-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 5 10-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 2 4-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 2 4-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 5 10-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

0 2 4-1

0

1

2

Time (s)

Accele

ration (

G)

X

Y

Z

IMU (Acceleration)


Kinect (Position)

𝑋𝑆(𝑡)

𝑋𝑇(𝑡)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

L1 L2 L3

Kinect (Position)

IMU to Kinect

IMU (Acceleration)

𝑋𝑆(𝑡)

𝑋𝑇(𝑡)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

Kinect (Position)

IMU to Kinect (signal mapping)

IMU (Acceleration)

𝑋𝑆(𝑡)

𝑋𝑇(𝑡)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

Coexistence… (T)

0 20 40-1

0

1

2

Time (s)P

ositio

n (

m)

0 20 40-1

0

1

2

Time (s)

Accele

ration (

G)

Kinect (Position)

IMU to Kinect (signal mapping)

IMU (Acceleration)

𝑋𝑆(𝑡)

𝑋𝑇(𝑡)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l

Ψ𝑇→𝑆 𝑡 : 𝑋𝑇(𝑡) → 𝑋 𝑆(𝑡) ≈ 𝑋𝑆(𝑡)

Kinect (Position)

IMU to Kinect (signal translation)

IMU (Acceleration)

𝑋𝑆(𝑡)

𝑋𝑇(𝑡)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l


Kinect (Position)


IMU (Acceleration)

𝑋𝑆(𝑡)

𝑋𝑇(𝑡)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l


0 1 2 3-1

0

1

2

Time (s)P

ositio

n (

m)

XYZ

Kinect (Position)


IMU (Acceleration)

𝑋𝑆(𝑡)

𝑋𝑇(𝑡)


Sig

na

l le

vel

Cla

ssif

ica

tio

n

leve

l


0 1 2 3-0.5

0

0.5

1

1.5

Time (s)A

ccele

ration (

G)

X

Y

Z

Experimental setup

Kinect http://code.google.com/p/qtkinectwrapper/ Xsens http://crnt.sourceforge.net/CRN_Toolbox/References.html

Dataset

• Two scenarios

Geometric Gestures (HCI) Idle (Background)

~5 min of data 5 gestures, 48 instances per gesture

Evaluation

• Analyzed transfers

– Kinect (position):

• HAND

– IMUs (acceleration):

• RIGHT LOWER ARM (RLA)

• RIGHT UPPER ARM (RUA)

• BACK

Evaluation

• Model

– MIMO mapping with 10 tap delay

• Mapping domains

– Problem-domain mapping (PDM)

– Gesture-specific mapping (GSM)

– Unrelated-domain mapping (UDM)

• Results

– Mapping learning: 100 samples (~3.3s)

– Mapping testing: rest of unused instances

– Selection randomly repeated 20 times in an outer CV process

Translation accuracy

• Model

– 3-NN, FS = max. & min.

– 5-fold cross validation

– 100 repetitions

• Results

To RLA To RUA To BACK From RLA From RUA From BACK0

20

40

60

80

100

Accura

cy (

%)

BS BT PDM GSM UDM

From Kinect … … to Kinect

Translation accuracy

• Model

– 3-NN, FS1 = mean, FS2 = max. & min.

– 5-fold cross validation

– 100 repetitions

• Results (UDM)

100 200 500 1k 2k 4k 9k#Samples

FS1BS

FS1BT

FS1T

FS2BS

FS2BT

FS2T

100 200 500 1k 2k 4k 9k

50

60

70

80

90

100

#Samples

Accura

cy (

%)

From Kinect to IMU (RLA) From IMU (RLA) to Kinect

Encountered limitations

• General model challenges/limitations

– Not all the mappings might be allowed (Temperature Gyro?)

• Kinect ↔ IMU challenges/limitations

– Different frame of reference (IMU local vs. Kinect world)

– Occlusions

– Subject out of range

– Torsions

Conclusions and future work

• Transfer system based on

– MIMO mapping model

– Template/Signal translation

• MAPPING: as few as a single gesture (~3 seconds)

• Successful translation across sensor modalities, Kinect ↔ IMU (4% and 8% below baseline)

• NEXT STEPS

– Analyze the effect of data loss (occlusions, anomalies, etc.)

– Higher characterization of the considered MIMO model (i.e., ‘q’ value)

– Alternative mapping models: ARMA, TDNN, LSSVM

– Combination of sensors (homogeneous/heterogeneous)

– Test in more complex setups/real-world situations

Thank you for your attention. Questions?

Oresti Baños Legrán Dep. Computer Architecture & Computer Technology

Faculty of Computer & Electrical Engineering (ETSIIT) University of Granada, Granada (SPAIN)

Email: [email protected] Phone: +34 958 241 516 Fax: +34 958 248 993

Work supported in part by the FP7 project OPPORTUNITY under FET-Open grant number 225938, the Spanish CICYT Project TIN2007-60587, Junta de Andalucia Projects P07-TIC-02768 and P07-TIC-02906, the CENIT project AmIVital and the FPU Spanish grant AP2009-2244

mailto:[email protected]

kinect=imu? learning mimo signal mappings to automatically translate activity recognition systems...

Science

time s positionm x y

time s accelerationg

time s positionm coexistence

time s positionm l1

time saccelerationg

imu acceleration kinect

problem statement scenario

learning mimo signal