detecting activities of daily living in first-person camera - people

38
1 Detecting Activities of Daily Living in First-person Camera Views Hamed Pirsiavash, Deva Ramanan Computer Science Department, UC Irvine

Upload: others

Post on 11-Sep-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting Activities of Daily Living in First-person Camera - People

1

Detecting Activities of Daily Living in First-person Camera Views

Hamed Pirsiavash, Deva Ramanan

Computer Science Department, UC Irvine

Page 2: Detecting Activities of Daily Living in First-person Camera - People

2

Motivation A sample video of Activities of Daily Living

Page 3: Detecting Activities of Daily Living in First-person Camera - People

3

Applications Tele-rehabilitation

•  Kopp et al,, Arch. of Physical Medicine and Rehabilitation. 1997.

•  Catz et al, Spinal Cord 1997.

Long-term at-home monitoring

Page 4: Detecting Activities of Daily Living in First-person Camera - People

Applications Life-logging

•  Gemmell et al, “MyLifeBits: a personal database for everything.” Communications of the ACM 2006.

•  Hodges et al, “SenseCam: A retrospective memory aid”, UbiComp, 2006.

So far, mostly “write-only” memory!

This is the right time for computer vision community to get involved.

4

Page 5: Detecting Activities of Daily Living in First-person Camera - People

There are quite a few video benchmarks for action recognition.

Collecting interesting but natural video is surprisingly hard. It is difficult to define action categories outside “sports” domain

Related work: action recognition

5

UCF sports, CVPR’08

UCF Youtube, CVPR’08

KTH, ICPR’04

Hollywood, CVPR’09

Olympics sport, BMVC’10

VIRAT, CVPR’11

5

Page 6: Detecting Activities of Daily Living in First-person Camera - People

6

Wearable ADL detection

It is easy to collect natural data

6

Page 7: Detecting Activities of Daily Living in First-person Camera - People

7

Wearable ADL detection

ADL actions derived from medical literature on patient rehabilitation

It is easy to collect natural data

7

Page 8: Detecting Activities of Daily Living in First-person Camera - People

•  Challenges –  What features to use? –  Appearance model –  Temporal model

•  Our model –  “Active” vs “passive” objects –  Temporal pyramid

•  Dataset

•  Experiments 8

Outline

8

Page 9: Detecting Activities of Daily Living in First-person Camera - People

Challenges What features to use?

Low level features

(Weak semantics)

High level features

(Strong semantics)

Human pose

Difficulties of pose: •  Detectors are not accurate enough •  Not useful in first person camera views

Space-time interest points

Laptev, IJCV’05

9

Page 10: Detecting Activities of Daily Living in First-person Camera - People

Challenges What features to use?

Low level features

(Weak semantics)

High level features

(Strong semantics)

Human pose Object-centric features Space-time interest points

Laptev, IJCV’05 Difficulties of pose: •  Detectors are not accurate enough •  Not useful in first person camera views

10

Page 11: Detecting Activities of Daily Living in First-person Camera - People

Challenges Occlusion / Functional state

“Classic” data

Page 12: Detecting Activities of Daily Living in First-person Camera - People

Challenges Occlusion / Functional state

Wearable data

“Classic” data

12

Page 13: Detecting Activities of Daily Living in First-person Camera - People

Challenges long-scale temporal structure

“Classic” data: boxing

Page 14: Detecting Activities of Daily Living in First-person Camera - People

Challenges long-scale temporal structure

time Start boiling

water Do other things (while waiting)

Pour in cup Drink tea

Difficult for HMMs to capture long-term temporal dependencies

Wearable data: making tea

“Classic” data: boxing

Page 15: Detecting Activities of Daily Living in First-person Camera - People

•  Challenges –  What features to use? –  Appearance model –  Temporal model

•  Our model –  “Active” vs “passive” objects –  Temporal pyramid

•  Dataset

•  Experiments 15

Outline

15

Page 16: Detecting Activities of Daily Living in First-person Camera - People

“Passive” vs “active” objects

Passive Active

16

Page 17: Detecting Activities of Daily Living in First-person Camera - People

“Passive” vs “active” objects

Passive Active 17

Page 18: Detecting Activities of Daily Living in First-person Camera - People

“Passive” vs “active” objects

Passive Active

Better object detection (visual phrases CVPR’11) Better features for action classification (active vs passive)

18

Page 19: Detecting Activities of Daily Living in First-person Camera - People

Appearance feature: bag of objects

Bag of detected objects

fridge TV stove

fridge TV stove

SVM classifier

Video clip

19

Page 20: Detecting Activities of Daily Living in First-person Camera - People

Appearance feature: bag of objects

Bag of detected objects

SVM classifier

Video clip

Active fridge

Active stove

Passive fridge

Active fridge

Active stove

Passive fridge

20

Page 21: Detecting Activities of Daily Living in First-person Camera - People

Inspired by “Spatial Pyramid” CVPR’06 and “Pyramid Match Kernels” ICCV’05

Temporal pyramid Coarse to fine correspondence matching with a multi-layer pyramid

Temporal pyramid descriptor

Video clip

SVM classifier

time 21

Page 22: Detecting Activities of Daily Living in First-person Camera - People

•  Challenges –  What features to use? –  Appearance model –  Temporal model

•  Our model –  “Active” vs “passive” objects –  Temporal pyramid

•  Dataset

•  Experiments 22

Outline

22

Page 23: Detecting Activities of Daily Living in First-person Camera - People

Sample video with annotations

23

Page 24: Detecting Activities of Daily Living in First-person Camera - People

24

Wearable ADL data collection

•  20 persons •  20 different apartments •  10 hours of HD video •  170 degrees of viewing angle •  Annotated

–  Actions –  Object bounding boxes –  Active-passive objects –  Object IDs

24

Prior work: •  Lee et al, CVPR’12 •  Fathi et al, CVPR’11, CVPR’12 •  Kitani et al, CVPR’11 •  Ren et al, CVPR’10

Page 25: Detecting Activities of Daily Living in First-person Camera - People

Average object locations

Active Passive Active Passive

Active Passive

25

Active objects tend to appear on the right hand side and closer –  Right-handed people are dominant –  We cannot mirror-flip images in training

Page 26: Detecting Activities of Daily Living in First-person Camera - People

•  Challenges –  What features to use? –  Appearance model –  Temporal model

•  Our model –  “Active” vs “passive” objects –  Temporal pyramid

•  Dataset

•  Experiments 26

Outline

26

Page 27: Detecting Activities of Daily Living in First-person Camera - People

Experiments

Low level features High level features

Our model Object-centric features

24 object categories

Baseline Space-time interest points

(STIP) Laptev et al, BMVC’09

27

Page 28: Detecting Activities of Daily Living in First-person Camera - People

28

Accuracy on 18 action categories •  Our model: 40.6% •  STIP baseline: 22.8%

Page 29: Detecting Activities of Daily Living in First-person Camera - People

29

Accuracy on 18 action categories •  Our model: 40.6% •  STIP baseline: 22.8%

Page 30: Detecting Activities of Daily Living in First-person Camera - People

30

Classification accuracy

•  Temporal model helps

30

Page 31: Detecting Activities of Daily Living in First-person Camera - People

31

•  Temporal model helps •  Our object-centric features outperform STIP

Classification accuracy

31

Page 32: Detecting Activities of Daily Living in First-person Camera - People

32

•  Temporal model helps •  Our object-centric features outperform STIP •  Visual phrases improves accuracy

Classification accuracy

32

Page 33: Detecting Activities of Daily Living in First-person Camera - People

33

•  Temporal model helps •  Our object-centric features outperform STIP •  Visual phrases improves accuracy •  Ideal object detectors double the performance

Classification accuracy

33

Page 34: Detecting Activities of Daily Living in First-person Camera - People

34

•  Temporal model helps •  Our object-centric features outperform STIP •  Visual phrases improves accuracy •  Ideal object detectors double the performance

Results on temporally continuous video and taxonomy loss are included in the paper

Classification accuracy

34

Page 35: Detecting Activities of Daily Living in First-person Camera - People

35

Summary

Data and code will be available soon!

Page 36: Detecting Activities of Daily Living in First-person Camera - People

36

Summary

Data and code will be available soon!

Page 37: Detecting Activities of Daily Living in First-person Camera - People

37

Summary

Data and code will be available soon!

Page 38: Detecting Activities of Daily Living in First-person Camera - People

38

Thanks!