understanding human-object interaction in rgb-d videos for ...€¦ · discriminative models for...

21
Zhiwen Fang BeingTogether CentreIMIResearch Fellow 1 Understanding Human-Object Interaction in RGB-D videos for Human Robot Interaction

Upload: others

Post on 25-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

Zhiwen FangBeingTogether Centre,IMI, Research Fellow

1

Understanding Human-Object Interaction in RGB-D videos for Human Robot Interaction

Page 2: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

Non-verbal language

2

MotivationHuman-robot interaction (HRI)[1,2,3]

[1] Yang Xiao, Zhijun Zhang, Aryel Beck, Junsong Yuan, and Daniel Thalmann. 2014. Human–robot interaction by understanding upper body gestures. Presence: teleoperators and virtual environments 23, 2 (2014), 133–154.[2] Isibor Kennedy Ihianle, Usman Naeem, and Abdel‐Rahman Tawil. 2016. Recognition of activities of daily living from topic model. Procedia Computer Science 98 (2016), 24–31.[3] Marina P′erez‐Jim′enez, Borja Bordel S′anchez, and Ram′on Alcarria. 2016. T4AI: A system for monitoring people based on improved wearable devices. Research Briefs on Information & Communication Technology Evolution (ReBICTE) 2 (2016), 1–16.

Human

Verbal language

Facial expression

body gesture

Object

Social robot

Page 3: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

Motivation

Page 4: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

Motivation

Understand the intention of the human based on the object information

with a cell phone in hand and close to ear, it may indicate

that the person is having a call.

with a cup in hand and close to mouse, it may indicate the

person is drinking.

How to detect hand-held objects?

Page 5: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

1 Introduction

2 Method

4 Results

5

Outline

5

Conclusions

3 System overview

Page 6: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

6

Wearable sensors & Radio Frequency Identification tags [1]

Thermal band images [2]

Computer vision method based on RGB camera [3][4]

[1] K. P. Fishkin, M. Philipose, and A. Rea. 2005. Hands-on RFID: wireless wearables for detecting use of objects. In IEEE International Symposium on Wearable Computers, 2005. Proceedings.38–43.[2] Cigdem Beyan and Alptekin Temizel. 2015. A multimodal approach for individual tracking of people and their belongings. The Imaging Science Journal 63, 4 (2015), 192–202.[3] Chaitanya Desai, Deva Ramanan, and Charless Fowlkes. 2010. Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops (CVPRW), 2010 IEEE computer society conference on. IEEE, 9–16.[4] Zhaozhuo Xu, Yuan Tian, Xinjue Hu, and Fangling Pu. 2015. Dangerous human event understanding using human‐object interaction model. In Signal Processing, Communications and Computing (ICSPCC), 2015 IEEE International Conference on. IEEE, 1–5.

Introduction

Page 7: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

7

Introduction

Research problems in hand-held object detection(1) Relationship between objects and a person

(2) Hand-held objects are often very small

(3) Targets loss because of appearance changes and/or part

occlusion in the sequence.

Chair, bottle, cell phone, keyboard… About 5 meters, bottle Part occlusion, cell phone

Page 8: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

1 Introduction

2 Method

4 Results

5

Outline

8

Conclusions

3 System overview

Page 9: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

9

Method

Human contextual information

1. Skeleton data (25 body joint positions)

2. Local patch around the hand joint

Page 10: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

10RGB image Person Index

Estimate the probability of belonging to a person1. Object Detection in the local patch

2. Estimate the probability using the person index map

Method

Page 11: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

11

Estimate the probability of belonging to a person

Method

Page 12: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

12

Object detection in a local patch by YOLO[1, 2]

(1) resize the image to 544 * 544

(2) run a convolutional network on the resized image

(3) output the results by the confidence of network model.

[1] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.[2] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[J]. arXiv preprint, 2017.

Method

Page 13: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

13

Method

Page 14: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

14

Object tracking based on correlation filter [1]

(1) dense sampling by modeling all possible translations of the

base sample in a search window as circulant shifts

(2) learning the correlation filter by solving a ridge regression

problem in the Fourier domain.[1] Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596.

Method

Page 15: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

1 Introduction

2 Method

4 Results

5

Outline

15

Conclusions

3 System overview

Page 16: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

16

Natural Language Processing

Natural Language Processing

Speech recognition

Natural Language Processing

Hand‐held object detection Object detection

Human and robot interaction

Language interaction

Object exchange

System overview

Page 17: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

1 Introduction

2 Method

4 Results

5

Outline

17

Conclusions

3 System overview

Page 18: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

18

Results

Detection rate of different methods in three categories (i.e. bottle, cup, cell phone).

* w/o represents the method without human contextual information

Page 19: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

1 Introduction & Literature Review

2 Method

4 Results

5

Outline

19

Conclusions

3 System overview

Page 20: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

20

Conclusions

To provide intelligent human-robot interaction, it is critical to

understand the interaction between the human and daily objects,

so that we can analyze the intention of the human.

Using a RGB-D sensor, we can provide a method to detect

hand-held objects

Human contextual information is introduced to improve the

performance of hand-held object detection

Page 21: Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops

THANK YOU!

21

Q & A